Quick start: do you have MCP?
If you are an AI agent and your client (Claude Desktop / Cursor / Claude Code) supports MCP, prefer calling tools from the IPOS MCP server (https://ipdesk.ai/mcp) over reading this entire document:
list_ai_actionsinstead of grepping the auto-generated agent tablelist_platform_routesinstead of the route catalog sectioncreate_app/upload_zip/poll_deploymentinstead of curl commandsget_skillto come back and read this when you need decision-level context
If MCP is not available, continue reading — every operation in this doc maps to a documented /api/ext/v1/... endpoint that you can hit with a Bearer token.
IPOS SDK Converter
Convert any existing app into an IPOS OS static app via two structured handoffs.
Core principle: every app connected to IPOS OS must be Agent-controllable. The OS Agent must be able to read the app's complete runtime state and invoke every user-visible control through a declared control surface. An app that only opens in an iframe is not fully converted.
When to use
Invoke this SKILL when:
- A customer says "I have an existing app and want it on IPOS OS / 想把我的工具放到 IPOS"
- Someone needs to scaffold a new app in
apps/<slug>-app/in the email-processor repo - An IPOS app deploy is failing — see
references/gotchas.mdfirst - A new platform route is being designed (Q3 of the diagnosis tree)
Architecture in one paragraph
In addition to the three execution buckets below, every converted app also has a mandatory Agent Control Surface: complete app state plus commands for every user-visible control. Treat it as a required control bucket even when the app is otherwise pure static zip code.
Every app capability lands in exactly one of three execution buckets: zip (JS/TS/React/Pyodide, runs in browser, no server), platform route (server-side endpoints maintained by IPOS — OCR, CORS proxies, etc.), or platform-js SDK (IPOS data read/write via postMessage to the shell). Customer servers: never. If a server-side capability is needed and no platform route exists, customer files a Platform Route Request → IPOS builds it → customer's zip never changes.
| Bucket | What lives here | Maintained by |
|---|---|---|
| zip | JS/TS logic, React UI, Pyodide Python | Customer (static, no server) |
| Platform route | Server-side capabilities (OCR, CORS proxy, AI calls) | IPOS (shared, centrally maintained) |
| platform-js SDK | IPOS data read/write, session identity | IPOS |
| Agent Control Surface | Complete app state snapshot + commands the Agent may invoke | Customer defines; IPOS shell transports/exposes |
Session Persistence (opt-in)
Apps may declare "session_persistence": true in their manifest to participate
in cross-device session restore. Doing so adds two standard postMessage
contracts: app → OS sends { kind: 'ipos:request', type: 'session.snapshot', payload: { state, schemaVersion } }; OS → app sends
{ type: 'ipos.session.restore', windowId, payload: { state, schemaVersion } }
once after the iframe handshake when a prior snapshot exists. Window geometry
is restored automatically; only in-app state (form drafts, current view,
pinned items) requires opt-in. See references/session-persistence.md for the
full contract.
How to run this SKILL
With a non-technical audience: Open part1-diagnosis.md, walk through the feature inventory and decision tree together. Output is a filled SDK Spec Sheet.
With a developer: Open part2-implementation.md. Hand them the SDK Spec Sheet from Part 1. They scaffold + implement + deploy.
Before designing a new route: Open platform-route-catalog.md to see what already exists, then references/route-design-patterns.md for the recipes.
Before finishing any app: Open references/agent-control-surface.md. Every app needs a complete state contract, action contract, manifest declarations, and verification.
Before deploy (Part 2 final step): Open references/flow-compliance-report.md and produce a plain-language HTML report for the customer comparing their original flow vs. the IPOS flow, with a compliance checklist. Ask the customer where to put it (default docs/<slug>-flow-compliance.html); do not bundle it into the zip.
When something breaks: Open references/gotchas.md — every issue hit during real deploys is catalogued there with root cause and fix.
For a worked example (Pattern A/B — external API proxy): references/g1001-walkthrough.md walks through converting a Streamlit Python app (g1001) into an SDK app, covering all three buckets including a CORS-blocked external API that required two new platform routes (OCR + USPTO proxy + a Cloudflare Worker to bypass IP blocking).
For a worked example (Pattern E — AI analysis + file storage): references/scan-doc-walkthrough.md walks through converting scan-doc-organizer (a Streamlit app that uses Gemini Vision to split combined scanned PDFs) into an SDK app. Covers: sending PDF inline to Gemini without rendering, SchemaType enum gotcha, platform.appFileUpload() to save to 檔案總管, credentials: 'include' for direct platform route calls, and author_org_id setup for org-owned apps.
Hard rules (don't violate)
-
No customer-managed servers. If a feature needs server-side execution, it goes through a platform route. Period.
-
No direct DB credentials in app code. All IPOS data goes through
platform.db()or platform routes. -
Bucket every capability before writing code. The SDK Spec Sheet is the contract; without it, you're guessing.
-
Pyodide is a zip-bucket option for pure Python only. It is never a workaround for missing platform routes — Pyodide can't make CORS-blocked HTTP calls or hold secrets.
-
Bump
versioninplatform.jsonfor every redeploy. Same version + new zip = samedeploymentIdreturned (idempotency); old failed deploy will not be retried. -
Agent control is mandatory. Every app must expose complete runtime state and every user-visible control through the Agent Control Surface. Server apps may use
/api/agent/*; static apps must use platform-js postMessage state/action bridging through the OS shell. -
No hidden UI-only state. If a user can see or change it, the Agent must be able to observe it and trigger the equivalent control unless the SDK Spec Sheet explicitly marks it as sensitive/non-controllable with a reason.
Quick reference: the deploy commands
# Build (use bash, NOT PowerShell Compress-Archive — see gotchas.md)
cd apps/<slug>-app
npm run build # produces out/
zip -r ../<slug>-<version>.zip platform.json out/
# Deploy via API (more reliable than DevConsole UI)
TOKEN='ipos_live_<your-dev-token>'
curl -X POST "https://ipdesk.ai/api/ext/v1/apps/<slug>/deploy" \
-H "Authorization: Bearer $TOKEN" \
-F "zip=@./<slug>-<version>.zip" \
-F "manifest=__from_zip__"
# Returns: { "deploymentId": "...", "status": "building" }
# Poll until live (usually 5-15 seconds)
curl "https://ipdesk.ai/api/ext/v1/deployments/<deploymentId>" \
-H "Authorization: Bearer $TOKEN"
AI Centralization (MANDATORY — read this first if your app touches AI)
Core rule: Every AI call from a customer app MUST go through IPOS's centralized ai_actions framework. No customer app may bring its own AI provider key, install a provider SDK (@google/generative-ai, @anthropic-ai/sdk, @mistralai/mistralai, openai direct, etc.), or fetch api.openai.com / api.anthropic.com / api.mistral.ai / generativelanguage.googleapis.com directly.
This is enforced two ways:
- CI-side:
npm run lint:ai-gateruns 3 ESLint custom rules overapp/api/platform/v1/**and red-lights direct SDK imports / provider env vars / forbidden hostnames. - Runtime-side: AI calls go through
executeAction(slug, orgId, ...)orexecuteImageAction(slug, orgId, ...)from@/lib/ai/actions. Those resolve provider/model/key from theai_providers+ai_actionsDB tables.
Why this matters
- Admin manages everything in DB. Adding a new AI agent = INSERT into
ai_actions, not a code change. Toggling a provider = admin UI click. - Customers don't pay token cost. IPOS absorbs it. No quota to track in your app.
- Model swapping is invisible. If we move
legacy_ocrfrom Mistral to a cheaper alternative tomorrow, your app keeps working — same slug, different underlying model. - Per-org governance. Admin can disable any slug for any org via
/admin/ai/grantsif they're abusing it. Your app sees a clean error and surfaces it to the user.
Step 1 — Discover existing AI agents BEFORE inventing new ones
Customer agents writing apps must fetch the live agent catalog before declaring uses_ai_actions:
curl https://ipdesk.ai/skill/ipos-sdk-converter/raw
The bottom of that markdown contains an auto-generated table of every ai_actions slug currently registered (with name / description / input_type / output_shape / example_use_case — model_id is intentionally hidden). Reuse an existing slug whenever possible. Inventing a duplicate slug pollutes admin's slug namespace and forces them to wire up a provider/model for a redundant agent.
Step 2 — Declare slugs in platform.json
In your app's platform.json:
{
"slug": "my-app",
"version": "1.0.0",
"ui_entry": { ... },
"uses_routes": ["/api/platform/v1/ocr"],
"uses_ai_actions": ["legacy_ocr", "scan_doc_classify"]
}
Both v1 (ManifestSchema) and v2 (BundleManifestV2Schema) accept this field; default is []. Validation: every slug must be a non-empty string. Schema enforces z.array(z.string().min(1)).optional().default([]).
Step 3 — What happens at install time
Behavior depends on whether the upload is admin path (uploading via /admin/os) or org path (uploading as an org user):
| Scenario | Slug exists in ai_actions |
Slug missing | Grants written |
|---|---|---|---|
| Admin path (author_type='platform') | ✅ install proceeds | ❌ install REJECTED with: Official app references unknown ai_action slug(s): X. Add them via /admin/ai/actions before installing. |
None (Q6 bypass — platform apps skip the grant layer entirely) |
| Org path (author_type='org') | ✅ install proceeds | ✅ stub ai_actions row auto-created with is_enabled=true, auto_created_from_app=app.id, NULL provider/model. /admin/ai/actions shows it with a yellow "從客戶 app 自動建立 — 請補上 provider/model" banner |
ai_action_grants(org_id, action_slug, status='enabled') upserted for every slug |
Practical implication: when you upload via the org path with a brand-new slug, the slug exists but isn't actually runnable yet — the auto-created stub has no provider/model. Calling executeAction('your_new_slug', orgId, ...) throws Action 'your_new_slug' has no provider/model configured. Admin must visit /admin/ai/actions and wire up provider + model before the agent works.
Step 4 — Calling the AI from a platform route (server-side only)
Platform routes (under app/api/platform/v1/**) call AI like this:
import { executeAction, executeImageAction } from '@/lib/ai/actions'
import { buildRequestContext } from '@/lib/ipos/request-context'
export async function POST(req: NextRequest) {
const rc = await buildRequestContext(req)
if (!rc.userId || !rc.orgId) return err('AUTH', 'Unauthorized', 401)
// Text-in / text-out:
const r = await executeAction(
'classify',
rc.orgId,
userMessage, // user-side input
undefined, // optional system prompt override
{ callerAuthorType: rc.appAuthorType ?? 'org' },
)
// Image / PDF in / text out (multimodal):
const r2 = await executeImageAction(
'legacy_ocr',
rc.orgId,
'Please OCR this PDF',
pdfBase64,
'application/pdf',
undefined,
{ callerAuthorType: rc.appAuthorType ?? 'org' },
)
return NextResponse.json({ data: { text: r.content } })
}
Always pass callerAuthorType: rc.appAuthorType ?? 'org' so the grant gate runs for org-installed apps and bypasses for platform apps. The appAuthorType field is populated by buildRequestContext via a 2-hop lookup (ipos_installations → ipos_apps.author_type).
Step 5 — What you MUST NOT do
These will fail npm run lint:ai-gate (and conceptually break the centralization promise):
// ❌ direct SDK import in app/api/platform/v1/**
import { GoogleGenerativeAI } from '@google/generative-ai'
// ❌ direct provider env var
const k = process.env.MISTRAL_API_KEY
// ❌ direct fetch to provider host
await fetch('https://api.openai.com/v1/chat/completions', { ... })
The only escape hatch is a top-of-line // eslint-disable-next-line ai-centralization/* -- ai-gate-allowlist: <reason> and that should be reserved for orchestrator-tier code that genuinely needs a feature executeAction doesn't yet support (e.g., OpenAI tool_calls). For ALL customer-facing AI features → centralize.
Step 6 — Calling AI from inside the zip (browser-side)
Don't do it directly. Browser-side code can't hold a provider key, and it shouldn't touch a provider URL even if it could (CORS, key leakage, no logging, no quota tracking).
Instead: have the browser call a platform route (which you may need to request — see "Platform Route Catalog" → "Requesting a new route") that wraps executeAction server-side.
Step 7 — Surfacing admin disable cleanly
If admin disables a grant, your app's call to executeAction throws:
Error: Action 'classify' has been disabled by admin for this org.
Catch it and show a user-friendly message:
try {
const r = await fetch('/api/platform/v1/my-route', { ... })
if (!r.ok) {
const err = await r.json()
if (err?.error?.message?.includes('disabled by admin')) {
setError('AI 功能已被管理員停用,請聯絡您的組織管理員')
return
}
throw new Error(err?.error?.message ?? `HTTP ${r.status}`)
}
} catch (e) { ... }
Quick reference — currently-registered slugs (snapshot, may be stale)
The live agent table is auto-appended to the bottom of this markdown at request time. Use that as the source of truth. Snapshot at time of writing:
| slug | input | use case |
|---|---|---|
classify |
text | Classify an email body into an action category |
legacy_ocr |
PDF (base64) | Mistral OCR via OpenRouter — full-page OCR |
scan_doc_classify |
text | Scan-doc Layer A: classify a document into doc_type/court/case_number etc. |
scan_doc_classify_pro |
text | Scan-doc Layer A pro tier — higher accuracy classifier |
scan_doc_layer_b_ask |
image | Scan-doc Layer B: high-DPI sub-region re-ask for stamps/dates/case nums |
scan_doc_ocr_primary |
image | Tier 1 OCR — Qianfan-OCR-Fast (free) |
scan_doc_ocr_fallback_1 |
image | Tier 2 OCR — Qwen2.5-VL 72B |
scan_doc_ocr_fallback_2 |
image | Tier 3 OCR — Gemini 2.5 Flash |
The auto-generated table below this document supersedes this snapshot.
Part 1 — Diagnosis
Part 1: Capability Diagnosis (Non-Technical)
Who fills this: Business owner, PM, or anyone who understands what the app does — no coding knowledge needed.
Output: A filled SDK Spec Sheet, ready to hand off to a developer.
Step 1: Feature Inventory
List every capability of your app, one row per capability. Think in terms of what the app does, not how it's built.
Good rows (specific capabilities):
- "Extract text from uploaded PDF"
- "Download PDFs from USPTO website"
- "Show a list of patent numbers the user can edit"
- "Save the patent numbers to the shared org database"
Bad rows (too coarse — split them):
- "Process the document" — what does processing mean?
- "The main workflow" — not a capability
- "Handle errors" — that's plumbing, not a feature
| Feature | What it does (one sentence) |
|---|---|
| (one capability per row) | (specific user-visible action) |
Step 1A: Agent State and Control Inventory
Every converted app must be controllable by the OS Agent. Inventory all runtime state and all user-visible controls before deciding buckets.
State means anything the user can see or that affects the workflow:
- Textarea/input values
- Uploaded file metadata
- Current step/view/tab/modal
- Selected row/item/email/case/document
- Toggle/slider/dropdown values
- Parsed/derived data
- Progress and pending jobs
- Success/failure results
- Validation and runtime errors
- Whether buttons/actions are enabled or disabled
Controls means anything the user can do:
- Type/set a value
- Click a button
- Toggle a checkbox
- Move a slider
- Select/remove/reorder a row
- Start/cancel/reset a workflow
- Download/export/save/push to DB
Fill both tables:
| State field | What user sees / why it matters | Sensitive or large? | Agent representation |
|---|---|---|---|
| (e.g. rawInput) | (textarea contents) | No | full string |
| (e.g. uploadedPdf) | (selected file) | Large | name, size, page count only |
| Control | Equivalent Agent action | Params | Disabled when |
|---|---|---|---|
| (e.g. Start button) | start |
none | no valid input / already running |
| (e.g. textarea edit) | set_input |
{ value: string } |
workflow running |
If something is intentionally not exposed to the Agent, write the reason here. "It is React state" is not a valid reason.
Step 2: Decision Tree (run for each feature)
For each feature, answer in order:
Q1: Can this run entirely in a web browser with no external HTTP calls?
Pure UI · data transformation · calculations · file manipulation
→ YES: Bucket = zip
→ NO: continue
Q2: Does the feature read or write data already stored in IPOS?
cases · emails · prior_art · documents · etc.
→ YES: Bucket = platform-js SDK (use platform.db())
→ NO: continue
Q3: Does the IPOS Platform Route Catalog have a matching route?
See platform-route-catalog.md
→ YES: Bucket = platform route (note the endpoint)
→ NO: continue
Q4: File a Platform Route Request to IPOS.
IPOS builds it → adds to catalog → then Bucket = platform route.
Common patterns
| If the feature is... | Bucket | Why |
|---|---|---|
| Parsing text in the browser | zip | No HTTP needed |
| Math, regex, formatting | zip | Pure JS |
| OCR / Vision / LLM call | platform route | Holds API key server-side |
| Fetch from external API blocked by CORS | platform route | Browser can't bypass CORS |
| Fetch from external API blocked by IP | platform route + CF Worker | Datacenter IPs often blocked; see gotchas |
| Saving to org's case / email / prior_art | platform-js SDK | platform.db('org:<table>') |
| Analyze document with AI (Vision/LLM) | platform route | API key is platform-managed (set by IPOS admin on Zeabur, not by the app developer). App calls the route; key is invisible to it. See Pattern E in route-design-patterns.md |
| Save output files to 檔案總管 | platform-js SDK | platform.appFileUpload() — requires org_write: ["app_files"] in platform.json.permissions |
| File download to user's machine | zip | <a download> after fetching bytes |
| Reading user's session info | platform-js SDK | platform.session.{user,org} |
| Real-time updates from server | ⚠️ not yet supported | WebSocket relay is future work |
Scope Decision: Strict Port vs Feature Expansion
Before filling the spec sheet, make one explicit decision — this prevents scope creep from derailing the conversion.
Strict port (default): Recreate the original app feature-for-feature. Nothing added, nothing removed. Every row in Step 3 maps to something the original app already did.
Feature expansion: Add new capabilities during the conversion. Only do this when the addition is clearly adjacent AND small. Each expansion must survive: "Would I build this separately if the port were already done?" If no → cut it.
When expanding scope, add a Scope column to the spec sheet to flag what's new:
| Feature | Scope | Bucket | Resource |
|---|---|---|---|
| (original feature) | PORT | ... | ... |
| (new addition) | NEW | ... | ... |
PORT rows must achieve parity with the original. NEW rows require explicit stakeholder sign-off before a developer starts. When in doubt: default to strict port, ship it, then handle expansions as a follow-up.
Step 3: Fill the SDK Spec Sheet
Combine inventory + decisions into one table:
| Feature | Needs server? | Bucket | Resource |
|---|---|---|---|
| (feature name) | Yes / No / IPOS data | zip / platform route / platform SDK | (library, endpoint, or SDK method) |
Then add the Agent Control Sheet:
| Requirement | Contract |
|---|---|
| Complete state schema | (link/list all fields from Step 1A) |
| Action schema | (list every action ID and params) |
| Transport | server /api/agent/* OR static platform-js bridge |
| Manifest hints | platform.json.agent_actions entries |
| Verification | how the Agent will read state and invoke actions |
Worked example (g1001 — USPTO patent extractor)
| Feature | Needs server? | Bucket | Resource |
|---|---|---|---|
| Upload PDF | No | zip | <input type="file"> |
| Extract text from PDF | No | zip | pdfjs-dist |
| Detect scanned PDF | No | zip | pdfjs-dist (text length heuristic) |
| OCR scanned PDF | Yes (API key) | platform route | /api/platform/v1/ocr |
| Filter PTO-892 section | No | zip | JS regex |
| Extract patent numbers | No | zip | JS regex |
| Display + edit list | No | zip | React |
| Download USPTO PDFs | Yes (CORS + IP block) | platform route | /api/platform/v1/uspto-proxy |
| Package as ZIP | No | zip | jszip |
| Push patent numbers to org DB | IPOS data | platform-js SDK | platform.db('org:prior_art').insert() |
Agent Control Sheet example:
| Requirement | Contract |
|---|---|
| Complete state schema | step, file metadata, extracted patents, edited patents, download progress, DB push result, errors |
| Action schema | set_file, extract, set_patents, start_download, cancel, reset |
| Transport | static platform-js bridge |
| Manifest hints | get_context plus all action IDs |
| Verification | Agent can inspect current step and invoke every workflow button/action |
Step 4: List New Platform Routes Needed
Scan the Resource column. Any row pointing at a route that's not in platform-route-catalog.md → file a request:
| Route needed | Inputs | Expected output | App(s) | Why server-side? |
|---|---|---|---|---|
| (path or one-line description) | (JSON / file / params) | (JSON / file bytes) | (app name) | (CORS / API key / IP block / ...) |
The "Why server-side?" column is critical — it tells the route designer whether they need a CF Worker (IP block), a credential vault (API key), or just a CORS proxy.
Handoff Checklist
Before passing the SDK Spec Sheet to a developer:
- [ ] Every app feature has exactly one row
- [ ] Every "platform route" row has a specific endpoint (existing in catalog or confirmed by IPOS)
- [ ] No row says "customer's own server" / "Lambda we maintain" / "our backend"
- [ ] IPOS data features all use platform SDK (not direct DB credentials)
- [ ] Pyodide rows acknowledge it's pure-Python only (no C extensions, no network)
- [ ] New routes (Step 4) are filed and have a "why server-side" reason
- [ ] Every visible state field is represented in the Agent state schema
- [ ] Every user-visible control has an equivalent Agent action
- [ ] Static apps identify the platform-js bridge requirement; server apps identify
/api/agent/*endpoints - [ ]
platform.json.agent_actionsis planned for every context/action entry
The completed sheet is the developer's contract. They should be able to implement each row without re-asking what the app does.
Part 2 — Implementation
Part 2: Implementation Guide (Technical)
Who reads this: The developer who received the SDK Spec Sheet from Part 1.
Work through each row of the SDK Spec Sheet. Each bucket has a recipe below. Then scaffold, build, deploy, verify.
Mandatory: Agent Control Surface
Before coding UI internals, read references/agent-control-surface.md and turn
the Agent Control Sheet into code. Every app must expose:
- Complete runtime state: all visible inputs, selections, current step/view, progress, results, errors, and enabled/disabled action state.
- Complete controls: every button, toggle, slider, text input, selection, start/cancel/reset/download/save action must have an Agent-callable equivalent.
- Manifest hints: every state/action entry must be declared in
platform.json.agent_actions.
Server/runtime app recipe:
GET /api/agent/context
POST /api/agent/action
Static zip app recipe:
// Desired platform-js bridge API. If this does not exist yet in the repo,
// the app is not fully Agent-controllable and the platform prerequisite must
// be documented before marking the conversion complete.
platform.agent.updateContext(agentContext)
platform.agent.registerAction('set_input', async ({ value }) => { ... })
platform.agent.registerAction('start', async () => { ... })
platform.agent.registerAction('cancel', async () => { ... })
platform.agent.registerAction('reset', async () => { ... })
Do not use DOM scraping, localStorage, or iframe inspection as the control
surface. Static apps must push state and receive actions through platform-js
postMessage via the OS shell.
Recipe: Bucket = zip (JS/TS logic)
Place pure logic in apps/<slug>-app/src/lib/, import in components.
// src/lib/normalize-patent.ts
export function normalizePatentNumber(raw: string): string {
return raw.replace(/^US/i, '').replace(/,/g, '').trim().replace(/^0+/, '') || '0'
}
If the original is Python and pure (no C extensions, no filesystem, no subprocess), prefer porting to TypeScript. Use Pyodide only when the Python is too gnarly to port (numpy heavy / regex with Python-specific lookbehind / etc.).
Recipe: Bucket = zip (React UI)
Mirror the structure of apps/g1001-app/src/. Use plain React, Next.js Pages Router (not App Router — static export support is more reliable on Pages).
// src/components/MyView.tsx
import React, { useState } from 'react'
interface Props { items: string[]; onSelect: (item: string) => void }
export default function MyView({ items, onSelect }: Props) {
return (
<ul>
{items.map((item) => (
<li key={item}>
{item} <button onClick={() => onSelect(item)}>選</button>
</li>
))}
</ul>
)
}
Recipe: Bucket = zip (Pyodide — pure Python)
Use only when porting to JS would lose a critical dependency. Limits: no C extensions (no pandas w/ native, no cv2, no pyarrow), no network calls (Pyodide can't bypass CORS or hold secrets — that's what platform routes are for).
// src/lib/pyodide-runner.ts
let pyodide: Awaited<ReturnType<typeof import('pyodide').loadPyodide>> | null = null
export async function runPython(code: string, inputs: Record<string, unknown> = {}): Promise<unknown> {
if (!pyodide) {
const { loadPyodide } = await import('pyodide')
pyodide = await loadPyodide({ indexURL: 'https://cdn.jsdelivr.net/pyodide/v0.26.4/full/' })
}
for (const [k, v] of Object.entries(inputs)) pyodide.globals.set(k, v)
return pyodide.runPython(code)
}
Recipe: Bucket = platform route (JSON in/out)
const res = await fetch('/api/platform/v1/<route>', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ /* per catalog */ }),
})
const body = await res.json().catch(() => null)
if (!res.ok) {
// Errors come back as { error: { code, message } }
throw new Error(body?.error?.message ?? `${res.status}`)
}
const { data } = body
Recipe: Bucket = platform route (binary in/out)
For routes returning file bytes:
const res = await fetch('/api/platform/v1/uspto-proxy', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ patent_number: 'US9999999' }),
})
if (!res.ok) {
// Try to parse JSON error envelope before giving up
const err = await res.json().catch(() => null)
throw new Error(err?.error?.message ?? `Download failed: ${res.status}`)
}
const buffer = await res.arrayBuffer()
Recipe: Bucket = platform-js SDK
Copy apps/g1001-app/src/vendor/platform-js.ts into your app's src/vendor/. Then in your top-level component:
import { Platform } from './vendor/platform-js'
import manifest from '../platform.json'
const [platform, setPlatform] = useState<Platform | null>(null)
useEffect(() => {
Platform.connect({ targetOrigin: '*' })
.then(async (p) => {
setPlatform(p)
// REQUIRED: always include version so the OS title bar and Agent can identify
// which build is running. Format: "<App Name> v<version>"
await p.setTitle(`${manifest.name} v${manifest.version}`)
})
.catch(console.error)
}, [])
if (!platform) return <div>Connecting...</div>
// Read
const rows = await platform.db('org:prior_art').select<PriorArt>()
// Insert (single row only — see gotchas if you have an array)
await platform.db('org:prior_art').insert({ patent_number: 'US9999999', country: 'US' })
// Update by id
await platform.db('org:cases').update(caseId, { status: 'in_progress' })
// Notify (toast in OS shell)
await platform.notify('已推送 5 個專利', 'success')
// Session
const { user, org } = platform.session
Mandatory: setTitle must always be called on connect with the format `${manifest.name} v${manifest.version}`. Never hardcode the version string — always read it from platform.json so bumping the manifest is the single source of truth.
⚠️ Three SDK gotchas — all hit during real builds:
.insert()only accepts a single object — not an array. To insert N rows, loopfor (const r of rows) await platform.db(...).insert(r). Seereferences/gotchas.md.- Only whitelisted columns work —
lib/platform/org-table-config.tsdeclaresinsertable_columnsper table. Sending an unlisted column = 400. New table column ≠ writeable until config updated. - Duplicate-key errors come back as
db_error500 — pattern-match the message (/duplicate key|unique constraint/i) to dedupe gracefully.
App Scaffold
Use apps/g1001-app/ as your template. Required files:
apps/<slug>-app/
├── platform.json # manifest — schema below
├── package.json # next 14.2.33, react 18, ...
├── next.config.js # output: 'export', basePath, assetPrefix — see template
├── tsconfig.json
├── pages/
│ ├── _app.tsx # minimal wrapper
│ └── index.tsx # dynamic(() => import('../src/MyApp'), { ssr: false })
└── src/
├── MyApp.tsx # top-level, calls Platform.connect()
├── vendor/
│ └── platform-js.ts # copy from g1001-app
├── steps/ or components/
└── lib/
platform.json (required fields all validated on upload)
{
"schema_version": "1.0",
"slug": "<your-slug>",
"name": "<Display Name>",
"description": "<optional>",
"version": "1.0.0",
"runtime": "nextjs",
"window": {
"default_width": 820,
"default_height": 640,
"resizable": true
},
"permissions": {
"org_write": ["<table>"],
"org_read": ["<table>"]
},
"agent_actions": [
{
"id": "get_context",
"description": "Returns the complete AgentContext snapshot for the current app window."
},
{
"id": "<action_id>",
"description": "Agent-callable equivalent of a user-visible control. Include params in plain language."
}
]
}
⚠️ Bump version for every redeploy. The deploy endpoint is idempotent on (slug, version) — if you re-upload with the same version, you get back the previous deploymentId (including FAILED ones). New version = new deploy.
Cross-device session persistence (optional)
If your app has user-meaningful in-app state (drafts, view selection, pinned
items) that should survive a browser close on machine A and re-open on machine
B, add "session_persistence": true to your manifest and implement the
session.snapshot postMessage emission + ipos.session.restore handler. See
references/session-persistence.md.
next.config.js (template — exact)
const platform = require('./platform.json');
const basePath = `/api/app-static/${platform.slug}`;
module.exports = {
reactStrictMode: true,
output: 'export',
basePath,
assetPrefix: basePath,
trailingSlash: true,
images: { unoptimized: true },
generateBuildId: async () => null,
};
Missing basePath/assetPrefix → all _next/static/... assets 404 after install. The IPOS shell serves the app from /api/app-static/<slug>/ so paths must be prefixed.
Build, Package, Deploy
cd apps/<slug>-app
# 1. Install + build → produces out/
npm install
npm run build
# 2. Zip — MUST use bash zip, NOT PowerShell Compress-Archive
# PowerShell creates backslash entries which DevConsole rejects as "Malicious entry"
# out/ MUST be a sub-folder inside the zip, not the zip's root
zip -r ../<slug>-<version>.zip platform.json out/
# Verify the zip looks right
unzip -l ../<slug>-<version>.zip | head
# Expected:
# platform.json
# out/
# out/index.html
# out/_next/...
Deploy via API (recommended — DevConsole UI is fragile)
TOKEN='ipos_live_<your-dev-token>' # localStorage.ipos_dev_token in DevConsole
SLUG='<slug>'
# Step 1: register app (skip if exists; 409 = already exists, that's fine)
curl -X POST 'https://ipdesk.ai/api/ext/v1/apps' \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d "{\"slug\":\"$SLUG\",\"name\":\"...\",\"description\":\"...\"}"
# Step 2: upload + deploy
curl -X POST "https://ipdesk.ai/api/ext/v1/apps/$SLUG/deploy" \
-H "Authorization: Bearer $TOKEN" \
-F "zip=@./<slug>-<version>.zip" \
-F "manifest=__from_zip__"
# Returns: { "deploymentId": "...", "status": "building" }
# Step 3: poll until live (5-15 sec typical)
DID='<deploymentId from step 2>'
for i in $(seq 1 10); do
curl -s "https://ipdesk.ai/api/ext/v1/deployments/$DID" \
-H "Authorization: Bearer $TOKEN"
echo
sleep 3
done
# Expect: status: "live", runtime_url: "/api/app-static/<slug>/"
# Step 4: stream build log (if it failed)
curl -N "https://ipdesk.ai/api/ext/v1/deployments/$DID/log/stream" \
-H "Authorization: Bearer $TOKEN"
Deploy via DevConsole UI (alternative)
- Open IPOS OS → Apps → 開發者主控台
- Drop zip into the dropzone
- Click 上傳並部署
- Watch build log stream
If the UI says "Malicious entry" → see references/gotchas.md (PowerShell zip).
Flow Compliance Report (give the customer an HTML they can read)
Before declaring the conversion complete, produce a plain-language HTML report
for the customer. Full instructions + template in
references/flow-compliance-report.md.
Steps:
- Ask the customer where to put it. Folder structure varies per customer.
Default:
docs/<slug>-flow-compliance.html. Confirm before writing. - Do not include this file in the deploy zip — it's documentation, not app code.
- Audience is the customer, not engineers. No
endpoint,route,iframe,manifest,bucket,postMessagejargon — translate every term. - Cover sections A–F from the template:
- A. Their original flow (in their own words)
- B. The new flow on IPOS
- C. Compliance checklist (✅ / ⚠️ / ❌ against IPOS hard rules)
- D. Remediation plan for every ⚠️ / ❌ (who does what, by when, can they still use it in the meantime?)
- E. What the OS Agent can see and do (translate every
agent_actionsentry into "things you can ask the assistant to do") - F. Which IPOS cloud services (platform routes) this app uses, and that IPOS — not the customer — maintains them
- Update the report once after deploy succeeds: fill in the actual
deploymentIdand version, flip status from "draft" to "live".
Verification Checklist
- [ ] Agent can read the complete state snapshot for the current window/app
- [ ] Agent can invoke every user-visible control through declared actions
- [ ]
platform.json.agent_actionsincludesget_contextplus every action ID - [ ] Disabled Agent actions return a reason instead of silently no-oping
- [ ] Window title shows
<App Name> v<version>(fromplatform.json— not hardcoded) - [ ] App opens in OS iframe without console errors
- [ ] Every row of the SDK Spec Sheet works end-to-end
- [ ]
_next/static/...assets load (no 404s — confirmsbasePathis correct) - [ ] iframe sandbox in components/os/Window.tsx includes
allow-downloadsif app triggers downloads - [ ] Re-running the flow doesn't crash on duplicate inserts (dedupe via
/duplicate key/imatch) - [ ] No raw API keys / DB credentials anywhere in the app source (
grep -r insforge\|sk_\|service_role apps/<slug>-app/src/) - [ ]
platform.notify(...)actually shows in the OS toast - [ ] After OS reload, the app still works (storage bucket persistence)
- [ ] Flow Compliance HTML report exists at the customer-confirmed location, written in plain language, with all ⚠️/❌ items having a remediation plan
Common bugs and where they're documented
| Symptom | Where to look |
|---|---|
Malicious entry: 404\index.html on upload |
references/gotchas.md → "PowerShell zip" |
does not allow inserts via SDK |
references/gotchas.md → "Read-only table" |
column "..." is not in insertable_columns |
Same — column whitelist enforcement |
| 502 Bad Gateway from a platform route | references/gotchas.md → "Cloudflare 502 stripping" |
| 403 from upstream API in platform route | references/gotchas.md → "Datacenter IP block / CF Worker" |
| iframe download blocked | references/gotchas.md → "iframe allow-downloads" |
Same deploymentId on retry |
references/gotchas.md → "Idempotent deploy by version" |
_next/static/* 404 |
next.config.js template above (basePath / assetPrefix) |
cloudClient: ... not set 500 |
env vars — INSFORGE_API_KEY + NEXT_PUBLIC_INSFORGE_BASE_URL on Zeabur |
For each, references/gotchas.md has the root cause and the fix.
Platform Route Catalog
Platform Route Catalog
Platform routes are server-side endpoints maintained by IPOS. Apps inside the OS iframe call them with fetch() — auth is handled automatically via the shell token (no Bearer needed).
Live routes
| Route | Method | Purpose | Request body | Response | Auth |
|---|---|---|---|---|---|
/api/platform/v1/ocr |
POST | OCR a (scanned) PDF via Mistral | { pdf_base64: string } |
{ data: { text: string } } |
shell token / API key |
/api/platform/v1/uspto-proxy |
POST | Fetch USPTO patent PDF (CORS + IP bypass) | { patent_number: string } |
PDF bytes (application/pdf) |
shell token / API key |
/api/platform/v1/prior-art/bulk |
POST | Upsert prior art + link to case | { patent_numbers: string[], country?: string, case_id?: string, oa_date?: string } |
{ data: { inserted, linked, prior_art[] } } |
Bearer (headless) |
/api/platform/v1/search |
GET | Cross-entity search (cases, emails, files) | ?q=...&entity=...&limit=... |
{ data: { results[] } } |
shell token / API key |
/api/platform/v1/org/<table> |
GET/POST/PATCH | Generic CRUD on whitelisted tables | see lib/platform/org-table-config.ts |
{ rows? row? } |
shell token only |
/api/platform/v1/org/<table>/bulk |
PATCH | Bulk update by id list (≤100) | { ids: string[], data: {...} } |
{ updated, rows } |
shell token only |
/api/platform/v1/scan-doc/analyze |
POST | Analyze combined PDF with Gemini Vision — detect document boundaries, case numbers, dates, suggested filenames | { pdf_b64: string } (max 10 MB) |
{ data: { documents: [{ doc_type, court, case_number, receipt_date, doc_date, start_page, end_page, suggested_filename, has_attachments, attachment_start_page }], page_count: number } } |
shell token / cookie session |
/api/platform/v1/scan-doc/split |
POST | Split a combined PDF into per-document PDFs using explicit page lists | { pdf_b64: string, documents: [{ filename: string, pages: number[] }] } |
{ data: { files: [{ filename: string, pdf_b64: string }] } } |
shell token / cookie session |
/api/app-static/[slug]/[...path] |
GET | Serve static app files from bucket | — | file bytes | iframe origin |
/api/platform/v1/scan-doc/analyze — sends the PDF inline to Gemini 2.5 Flash Vision via inlineData (no server-side PDF rendering needed). Uses responseSchema with SchemaType enum for structured JSON output. Note: current implementation uses GEMINI_API_KEY env var directly as a temporary shortcut; ideally this should call executeAction() from lib/ai/actions.ts (AI Actions system) so the admin can toggle the model at /admin/ai/actions and the app shows "AI 尚未啟動,請洽管理員" when disabled. Refactor target once executeActionMultimodal() is available. Max PDF is 10 MB. Returns one entry per logical document found in the combined scan.
/api/platform/v1/scan-doc/split — pure server-side PDF splitting via pdf-lib. No external API calls. Accepts the same PDF bytes + an array of { filename, pages: number[] } where pages is an explicit list of 1-indexed page numbers in output order (supports page reordering). Returns each document as base64 for the app to upload via platform.appFileUpload().
Notes per route
/api/platform/v1/uspto-proxy — internally fetches via USPTO_PROXY_URL env var (Cloudflare Worker at uspto-proxy.<sub>.workers.dev) because USPTO blocks Zeabur datacenter IPs. Returns 503 (NOT 502 — Cloudflare strips the body of 502 responses) on failure. See references/route-design-patterns.md → "External API blocked by IP" pattern.
/api/platform/v1/org/<table> — only tables listed in lib/platform/org-table-config.ts are exposed. Table needs insertable_columns: [...] to allow POST and writable_columns: [...] to allow PATCH. Both default to disabled. The route always injects org_id server-side; never accept id or org_id from the caller.
/api/platform/v1/prior-art/bulk vs platform.db('org:prior_art').insert() — the bulk route is for headless callers (Bearer-authed); the SDK is for in-iframe apps. Both work; pick based on caller.
How to call a platform route from inside an app
// In any component or lib — shell token added automatically by the SDK proxy
const res = await fetch('/api/platform/v1/ocr', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ pdf_base64: base64String }),
})
const { data, error } = await res.json()
if (!res.ok || error) throw new Error(error?.message ?? `HTTP ${res.status}`)
console.log(data.text)
For routes that return file bytes (e.g. uspto-proxy):
const res = await fetch('/api/platform/v1/uspto-proxy', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ patent_number: 'US9999999' }),
})
if (!res.ok) {
const err = await res.json().catch(() => null)
throw new Error(err?.error?.message ?? `Download failed: ${res.status}`)
}
const buffer = await res.arrayBuffer()
Requesting a new route
If the diagnosis tree (Part 1) reaches Q4, file a Platform Route Request with:
- What — one-sentence description of what the route should do
- Inputs — JSON fields, file types, max sizes
- Outputs — JSON shape, content-type, max size
- Why server-side — pick the right reason:
cors— external API has noAccess-Control-Allow-Originip_block— external API blocks datacenter IPs (needs CF Worker proxy)secret— needs an API key/credential held by IPOScompute— needs server CPU/RAM that's too heavy for browsercross_org_join— needs a SQL query that spans tables an SDK call can't reach
- Which apps need the route
When IPOS builds the route, it lands in this catalog. The customer's zip never changes — only the platform gains capability.
Pattern decision: where new routes belong
/api/platform/v1/<route> # public to all installed apps via shell token
/api/ext/v1/<route> # Bearer-authed, headless callers (CI, scripts, webhooks)
/api/v1/<route> # IPDesk first-party UI only (cookie session)
A route serving an SDK app should always live under /api/platform/v1/. If it also needs to be callable from a CI script, expose a parallel /api/ext/v1/<route> that wraps the same business logic. Don't expose /api/v1/ to apps — those use the user's session cookie which the iframe doesn't carry.
Reference: Agent Control Surface
Agent Control Surface
Every app connected to IPOS OS must expose a complete Agent Control Surface. The Agent must be able to know what the app is doing and invoke every user-visible control. Treat this as part of the app contract, not an optional enhancement.
Required deliverables
For every app, define these four artifacts before implementation is considered complete:
- State schema - a serializable snapshot of all runtime state the UI uses.
- Action schema - every user-visible control as an Agent-callable command.
- Manifest declarations -
platform.json.agent_actionsdescribing context and commands. - Verification - tests or manual proof that the Agent can read state and invoke controls.
If a state field or action is deliberately excluded, document why. Valid reasons are narrow: secrets, raw file bytes too large to serialize, or dangerous actions that require an explicit human confirmation step. "It lives in React state" is not a valid exclusion.
State contract
The state snapshot must include everything needed for the Agent to answer: "What is the user looking at, what has happened, and what can be done next?"
Minimum fields:
interface AgentContext {
app: {
slug: string
version: string
route: string
}
ui: {
currentView: string
currentStep?: string
focusedControl?: string | null
modal?: string | null
}
inputs: Record<string, unknown>
derived: Record<string, unknown>
progress: Record<string, unknown>
results: Record<string, unknown>
errors: Array<{ code?: string; message: string; target?: string }>
capabilities: {
availableActions: string[]
disabledActions: Record<string, string>
}
updatedAt: string
}
Guidelines:
- Include textarea/input values, selected rows, current step, toggles, progress, success/failure counts, validation errors, and pending async operations.
- Include summaries for large data, plus IDs/names/counts. Do not send raw file blobs or huge binary payloads.
- Include enough data for the Agent to continue the workflow without guessing.
- Use stable action IDs and state field names; do not encode UI labels as API contracts.
Action contract
Every user-visible control needs an equivalent action.
Examples:
| UI control | Agent action |
|---|---|
| Type into textarea | set_input |
| Change delay slider | set_delay_ms |
| Toggle push-to-DB checkbox | set_push_db |
| Start button | start |
| Cancel button | cancel |
| Reset button | reset |
| Select row | select_row |
| Remove row | remove_row |
| Download/export | download or export |
Action shape:
interface AgentActionRequest {
action: string
params?: Record<string, unknown>
}
interface AgentActionResult {
ok: boolean
state: AgentContext
error?: { code: string; message: string }
}
Rules:
- Actions should return the updated state snapshot.
- Actions must share validation with the UI path where possible.
- Dangerous or irreversible actions must support
dryRunor require an explicit confirmation parameter. - If the UI button is disabled, the action should return a structured disabled reason, not silently no-op.
Server app pattern
Server/runtime apps can expose HTTP endpoints directly:
GET /api/agent/context
POST /api/agent/action
GET /api/agent/context returns the current AgentContext.
POST /api/agent/action accepts { action, params }, performs the same state
transition as the UI, and returns { ok, state, error? }.
These endpoints must be shell-authenticated. Do not expose them as public unauthenticated APIs.
Static zip app pattern
Static apps have no server-side runtime, so they cannot implement
/api/agent/context by themselves. They must push state to the OS shell through
platform-js and receive action commands from the shell.
Required platform-js capabilities:
platform.agent.updateContext(context)
platform.agent.registerAction(actionId, handler)
Until these SDK methods exist in the repo, static apps are not fully Agent-controllable. Do not mark the conversion complete; document the missing platform-js bridge as a platform prerequisite.
Expected static app flow:
- Top-level app builds an
AgentContextfrom React state. - On every meaningful state change, app calls
platform.agent.updateContext. - App registers handlers for every action in the Action contract.
- OS shell stores the latest context per window/app/installation.
- OS Agent reads that shell-stored context and dispatches actions through the same bridge.
Do not solve this with localStorage, DOM scraping, or direct iframe inspection.
The supported path is explicit postMessage state/action bridging.
Manifest declarations
Every app must declare Agent capabilities in platform.json:
{
"agent_actions": [
{
"id": "get_context",
"description": "Returns the complete AgentContext snapshot for the current app window."
},
{
"id": "set_input",
"description": "Sets the main input text. Params: { value: string }"
},
{
"id": "start",
"description": "Starts the main workflow using the current state."
},
{
"id": "cancel",
"description": "Cancels the running workflow if one is active."
},
{
"id": "reset",
"description": "Resets the workflow to its initial state."
}
]
}
Use exact action IDs from the Action contract. The indexer reads
platform.json.agent_actions into ipos_app_capability_index.agent_hints, so
missing declarations make the app invisible to the Agent.
Example: g1002 required state/actions
State:
rawInput- parsed patent list and validation errors
delayMspushDbphase- per-row status, size, source, error
- cancel/running status
- success/failure/total counts
- whether each action is currently available
Actions:
set_input({ value })set_delay_ms({ value })set_push_db({ value })start()cancel()reset()
Without the static-app platform-js bridge, g1002 can be opened by the Agent but cannot be considered Agent-controllable.
Standard commands: session persistence
These two postMessage shapes are used when the app declares
"session_persistence": true in its manifest. They are OS-level contracts, not
Agent-level actions, but they share the same postMessage transport and are
documented here for completeness.
session.snapshot (app → OS)
Required for: apps that declare session_persistence: true in the manifest.
Direction: app posts to parent (OS shell).
Shape: { kind: 'ipos:request', type: 'session.snapshot', id, payload: { state: unknown, schemaVersion: number } }
When to send: any time persistable app state has changed. The OS debounces to one write per second per window, so over-eager calls are safe.
Returns: standard ipos:response ack with { ok: true }. Errors (e.g.
oversize payload >64KB) are silently dropped server-side — the previous good
snapshot remains.
ipos.session.restore (OS → app)
Required for: apps that declare session_persistence: true.
Direction: OS posts to app iframe.
Shape: { type: 'ipos.session.restore', windowId, payload: { state: unknown, schemaVersion: number } }
When received: once, after the iframe completes its handshake request, if
a prior snapshot exists. Not sent when the user has never used the app before.
Behavior: apply or migrate the prior state. Throwing inside the handler is a silent no-op (OS does not retry); apps should fail gracefully and continue with the default empty state.
Verification checklist
- [ ]
platform.json.agent_actionslists context plus every action. - [ ] Agent can retrieve current state after opening the app.
- [ ] State includes all visible inputs, selected items, progress, results, and errors.
- [ ] Agent can invoke every UI control through an action.
- [ ] Actions return updated state.
- [ ] Disabled actions return a reason.
- [ ] Sensitive/large fields are summarized and documented.
- [ ] Static app uses platform-js bridge; server app uses shell-authenticated
/api/agent/*endpoints. - [ ] If
session_persistence: true: bothsession.snapshotemission andipos.session.restorehandling are implemented and tested across schema bumps.
Reference: Route Design Patterns
Route Design Patterns
Recipes for the most common platform route scenarios, ordered from simplest to most complex. For env var setup, see gotchas #16. For the CF Worker deploy flow, see gotchas #8 and #15.
Pattern A: Simple CORS proxy
Use when: External API works from any server but browser fetch() fails with CORS error (no Access-Control-Allow-Origin header).
Route file template
// app/api/platform/v1/<route-name>/route.ts
import { NextRequest } from 'next/server'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'
export async function POST(req: NextRequest) {
const ctx = await buildRequestContext(req)
if (!ctx.ok) return ctx.errorResponse
let body: { field: string }
try { body = await req.json() } catch { return errJson('BAD_REQUEST', 'invalid JSON', 400) }
if (!body.field) return errJson('BAD_REQUEST', 'missing field', 400)
try {
const upstream = await fetch('https://api.example.com/endpoint', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ field: body.field }),
signal: AbortSignal.timeout(25_000), // Zeabur limit is 30s; 25s leaves room
})
if (!upstream.ok) {
const text = await upstream.text().catch(() => '')
return errJson('UPSTREAM_ERROR', `upstream ${upstream.status}: ${text.slice(0, 200)}`, 503)
}
// JSON response variant:
return Response.json({ data: await upstream.json() })
// Binary response variant (PDF, etc.):
// return new Response(await upstream.arrayBuffer(), {
// headers: { 'Content-Type': upstream.headers.get('Content-Type') ?? 'application/pdf' },
// })
} catch (err) {
return errJson('PROXY_ERROR', err instanceof Error ? err.message : 'network error', 503)
}
}
Checklist:
buildRequestContextalways runs first — validates caller identityAbortSignal.timeout(25_000)— Zeabur hard-kills at 30s; leave 5s margin- Return 503 (not 502) on upstream failure — Cloudflare strips 502 bodies (gotchas #7)
text.slice(0, 200)— caps upstream error so sensitive page HTML doesn't leak to the caller
Pattern B: IP-blocked external API (CF Worker egress proxy)
Use when: External API returns 403 from Zeabur's datacenter IPs but works from your local machine. User-Agent alone does not fix it.
How to confirm: curl <url> locally → 200. Same curl via Zeabur exec → 403 or connection refused.
Examples: USPTO full-text images, some government portals.
Step 1 — Write the CF Worker
// infra/cf-workers/<upstream>-proxy/worker.js
export default {
async fetch(req) {
const u = new URL(req.url)
const path = u.pathname.replace(/^\//, '') // strip leading slash; pass the rest as upstream path
const upstream = await fetch(`https://upstream.example.com/${path}`, {
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; YourApp/1.0)' },
cf: { cacheEverything: true, cacheTtl: 86400 },
})
return new Response(upstream.body, {
status: upstream.status,
headers: {
'Content-Type': upstream.headers.get('Content-Type') ?? 'application/octet-stream',
'Cache-Control': 'public, max-age=86400',
},
})
},
}
# infra/cf-workers/<upstream>-proxy/wrangler.toml
name = "<upstream>-proxy"
main = "worker.js"
compatibility_date = "2024-01-01"
# Deploy (once, not part of app deploy cycle)
cd infra/cf-workers/<upstream>-proxy
CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy
# Prints: https://<upstream>-proxy.<sub>.workers.dev ← copy this
Step 2 — Set the env var in Zeabur
Naming convention: <UPSTREAM>_PROXY_URL in ALL_CAPS_SNAKE_CASE.
# Use gotchas #16 curl mutation, or Zeabur dashboard
<UPSTREAM>_PROXY_URL=https://<upstream>-proxy.<sub>.workers.dev
Step 3 — Write the server helper
// lib/platform/<upstream>.ts
export async function fetchFromUpstream(path: string): Promise<ArrayBuffer> {
const proxyBase = process.env.<UPSTREAM>_PROXY_URL?.replace(/\/$/, '')
// Local dev: proxyBase is unset → falls back to direct (works from laptop, not from Zeabur)
const url = proxyBase
? `${proxyBase}/${path}`
: `https://upstream.example.com/${path}`
const res = await fetch(url, { signal: AbortSignal.timeout(25_000) })
if (!res.ok) throw new Error(`upstream ${res.status} for ${path}`)
return res.arrayBuffer()
}
Step 4 — Use the helper from the route
Call fetchFromUpstream(path) inside the route's try block, same as Pattern A.
Live example: infra/cf-workers/uspto-proxy/ + lib/platform/uspto.ts
Pattern C: Two-step scrape proxy
Use when: The external API has no direct download URL — you must fetch a page first, parse a URL from it, then fetch the actual resource.
Example: Google Patents — the patent page HTML contains <meta name="citation_pdf_url" content="...">. You parse that to get the real PDF URL.
Worker (two fetches per user request)
// infra/cf-workers/google-patents-proxy/worker.js
export default {
async fetch(req) {
const u = new URL(req.url)
const patentId = u.pathname.replace(/^\//, '') // e.g. "US7654321B2"
// Step 1: fetch the HTML patent page
const pageRes = await fetch(`https://patents.google.com/patent/${patentId}`, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml',
},
})
if (!pageRes.ok) return new Response(`page fetch failed: ${pageRes.status}`, { status: pageRes.status })
// Step 2: parse the PDF URL out of the meta tag
const html = await pageRes.text()
const match = html.match(/<meta\s+name="citation_pdf_url"\s+content="([^"]+)"/)
if (!match) return new Response('no citation_pdf_url meta tag found', { status: 404 })
// Step 3: fetch the actual PDF
const pdfRes = await fetch(match[1], {
headers: { 'User-Agent': 'Mozilla/5.0 (compatible)' },
cf: { cacheEverything: true, cacheTtl: 3600 },
})
if (!pdfRes.ok) return new Response(`pdf fetch failed: ${pdfRes.status}`, { status: pdfRes.status })
return new Response(pdfRes.body, {
headers: { 'Content-Type': 'application/pdf', 'Cache-Control': 'public, max-age=3600' },
})
},
}
Fragility note: Two-step workers depend on the source page's HTML structure. If the meta tag format changes, the worker silently 404s. Add a comment in the route README about what to check if it breaks.
Live example: infra/cf-workers/google-patents-proxy/
Pattern D: Multi-source fallback
Use when: The same data can be fetched from multiple sources, and you want resilience when the primary fails.
Client-side vs server-side — which to pick
| Client-side (zip) | Server-side (platform route) | |
|---|---|---|
| How calls are made | Separate fetch() per source |
Route tries all sources internally |
| User sees which source worked | Yes — show progress per source | No — one call, returns result or error |
| Changing source order | Requires zip redeploy | Push to main; no zip change |
| Debugging | Source name visible in UI | Logged server-side only |
| Use when | User benefits from seeing per-source status | User just needs the file; hide complexity |
Client-side fallback (in the zip)
// apps/<slug>-app/src/lib/try-sources.ts
interface Source { name: string; fetch: () => Promise<ArrayBuffer> }
export async function downloadWithFallback(
sources: Source[],
): Promise<{ buffer: ArrayBuffer; source: string }> {
const errors: string[] = []
for (const { name, fetch } of sources) {
try {
const buffer = await fetch()
return { buffer, source: name }
} catch (err) {
errors.push(`${name}: ${err instanceof Error ? err.message : String(err)}`)
}
}
throw new Error(errors.join(' | '))
}
// Usage:
const { buffer, source } = await downloadWithFallback([
{ name: 'primary', fetch: () => fetchFromPrimary(id) },
{ name: 'fallback', fetch: () => fetchFromFallback(id) },
])
Server-side fallback (inside platform route)
// Inside the POST handler
const sources = [fetchFromPrimary, fetchFromFallback, fetchFromTertiary]
let lastErr = new Error('no sources')
for (const fetch of sources) {
try { return new Response(await fetch(id), { headers: { 'Content-Type': 'application/pdf' } }) }
catch (err) { lastErr = err instanceof Error ? err : new Error(String(err)) }
}
return errJson('ALL_SOURCES_FAILED', lastErr.message, 503)
Live example (client-side): apps/g1002-app/src/lib/try-sources.ts — Google Patents (with suffix) → Google Patents (bare) → USPTO, with per-row progress shown to user.
Pattern E: AI analysis via AI Actions system + file storage
Use when: The app needs to call an AI model (text, vision, or multimodal) and optionally save output files to 檔案總管 (app_files table).
Why server-side: AI provider credentials are held by the platform. The browser never touches an API key.
How IPOS AI works
The platform has an AI Actions framework. Each "action" is a row in the ai_actions table:
| Column | Meaning |
|---|---|
slug |
Unique ID the route uses to look up the action (scan-doc-analyze, ocr, …) |
provider_id |
Points to ai_providers row (OpenRouter, BazaarLink, …) |
model_id |
Model string sent to the provider (google/gemini-2.5-flash, etc.) |
is_enabled |
Gate — if false, route returns AI_NOT_ENABLED; app shows "AI 尚未啟動,請洽管理員" |
system_prompt |
Default system prompt; route can override per-call |
config |
JSON blob for extra provider settings |
Admin configures actions at https://ipdesk.ai/admin/ai/actions. App developers never configure providers or API keys — they only know the action slug.
Providers (ai_providers table) store base_url + encrypted api_key. Both OpenRouter and BazaarLink expose an OpenAI-compatible API, so the same client code works for any model they route to (Gemini, Claude, GPT-4o, etc.).
Platform-side: call executeAction
// app/api/platform/v1/<slug>/analyze/route.ts
import { NextRequest } from 'next/server'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'
import { executeAction } from '@/lib/ai/actions'
export async function POST(req: NextRequest) {
const ctx = await buildRequestContext(req)
if (!ctx.ok) return ctx.errorResponse
let body: { text?: string }
try { body = await req.json() } catch { return errJson('BAD_REQUEST', 'invalid JSON', 400) }
if (!body.text) return errJson('BAD_REQUEST', 'text is required', 400)
try {
const result = await executeAction(
'your-action-slug', // matches ai_actions.slug
ctx.orgId, // for usage logging
body.text, // user message
)
return Response.json({ data: { content: result.content } })
} catch (err) {
const msg = err instanceof Error ? err.message : 'unknown error'
if (msg.includes('disabled') || msg.includes('not found')) {
return errJson('AI_NOT_ENABLED', msg, 503)
}
return errJson('AI_ERROR', msg, 500)
}
}
executeAction (in lib/ai/actions.ts) handles everything: slug lookup, is_enabled check, key decryption, OpenAI-compatible call, usage logging. If is_enabled is false it throws — the route maps that to AI_NOT_ENABLED HTTP 503.
App side: handle AI_NOT_ENABLED
const res = await fetch('/api/platform/v1/<slug>/analyze', {
method: 'POST',
credentials: 'include', // required for iframe → platform route auth
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: userInput }),
})
const { data, error } = await res.json()
if (!res.ok || error) {
if (error?.code === 'AI_NOT_ENABLED') {
setError('AI 功能尚未啟動,請洽管理員')
} else {
setError(error?.message ?? `Error ${res.status}`)
}
return
}
// use data.content
Every AI-powered app must handle AI_NOT_ENABLED and display a friendly message. Never show raw API errors to the user.
Processing / splitting route template (no AI — pure compute)
When the second route just transforms files without AI (e.g. PDF splitting), it uses no AI Actions:
// app/api/platform/v1/<slug>/split/route.ts
import { NextRequest } from 'next/server'
import { PDFDocument as PdfLibDoc } from 'pdf-lib'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'
function err(code: string, message: string, status = 400) {
return Response.json({ data: null, error: { code, message } }, { status })
}
export async function POST(req: NextRequest) {
const ctx = await buildRequestContext(req)
if (!ctx.ok) return ctx.errorResponse
const { pdf_b64, documents } = await req.json()
if (!pdf_b64 || !Array.isArray(documents) || documents.length === 0)
return err('BAD_REQUEST', 'pdf_b64 and documents[] required')
const srcDoc = await PdfLibDoc.load(Buffer.from(pdf_b64, 'base64'), { ignoreEncryption: true })
const files: { filename: string; pdf_b64: string }[] = []
for (const doc of documents) {
if (!Array.isArray(doc.pages) || doc.pages.length === 0)
return err('BAD_REQUEST', `pages must be a non-empty array for "${doc.filename}"`)
const outDoc = await PdfLibDoc.create()
const copied = await outDoc.copyPages(srcDoc, doc.pages.map((p: number) => p - 1))
copied.forEach((page) => outDoc.addPage(page))
const bytes = await outDoc.save()
files.push({ filename: doc.filename, pdf_b64: Buffer.from(bytes).toString('base64') })
}
return Response.json({ data: { files } })
}
App side: save output files to 檔案總管
After the split/processing route returns files, upload each one via the platform SDK:
for (const file of splitResult.files) {
await platform.appFileUpload({
name: `${file.filename}.pdf`,
mime_type: 'application/pdf',
content_b64: file.pdf_b64,
metadata: { source: 'your-app-slug', original_file: uploadedFileName },
})
}
platform.appFileUpload() requires org_write: ["app_files"] in platform.json.permissions. Files appear immediately in 檔案總管 for that org.
Multimodal / Vision (PDF inline to AI)
executeAction handles text-only prompts. For Vision/multimodal (sending a PDF or image to the AI), a future executeActionMultimodal() function will be added to lib/ai/actions.ts. Until then, multimodal routes may use a platform-managed env var directly (e.g. GEMINI_API_KEY on Zeabur) as a temporary shortcut — but this bypasses the AI Actions system (no is_enabled gate, no admin model selection, no usage logging).
Tech debt note: The current scan-doc/analyze route uses this shortcut (direct GEMINI_API_KEY). Once executeActionMultimodal() is available it should be refactored. New multimodal routes should wait for executeActionMultimodal() rather than repeating the shortcut.
If using the shortcut temporarily, the SchemaType enum gotcha still applies — see the test mock pattern below.
Test mock for @google/generative-ai (shortcut routes only)
vi.mock('@google/generative-ai', () => ({
SchemaType: { // REQUIRED — omitting causes SchemaType.ARRAY to be undefined at import
ARRAY: 'array',
OBJECT: 'object',
STRING: 'string',
INTEGER: 'integer',
BOOLEAN: 'boolean',
NUMBER: 'number',
},
GoogleGenerativeAI: vi.fn().mockImplementation(() => ({
getGenerativeModel: vi.fn().mockReturnValue({
generateContent: vi.fn().mockResolvedValue({
response: { text: () => JSON.stringify([{ title: 'Test Doc', start_page: 1, end_page: 2 }]) },
}),
}),
})),
}))
Admin setup checklist (when building a new AI route)
- Decide the action
slug(e.g.invoice-extract,contract-summarize) - Ask the platform admin to create the row in
ai_actionsat/admin/ai/actions:slug: your chosen slugprovider_id: pick OpenRouter or BazaarLinkmodel_id: the model string (e.g.google/gemini-2.5-flash)is_enabled:trueto activatesystem_prompt: the domain-specific instruction
- Call
executeAction(slug, orgId, userMessage)from your route - Your app handles
AI_NOT_ENABLED(shows "AI 尚未啟動,請洽管理員")
Live example: app/api/platform/v1/scan-doc/analyze/route.ts and split/route.ts — scan-doc is a worked example of this pattern (note: analyze currently uses the direct-key shortcut; split has no AI). See references/scan-doc-walkthrough.md.
Test recipe
Every new platform route needs:
1. Unit tests for the helper (tests/<slug>/<helper>.test.ts)
import { vi, describe, it, expect, beforeEach } from 'vitest'
import { fetchFromUpstream } from '../../lib/platform/<upstream>'
beforeEach(() => { vi.restoreAllMocks() })
describe('fetchFromUpstream', () => {
it('returns buffer on 200', async () => {
vi.stubGlobal('fetch', async () => new Response(new Uint8Array([1, 2, 3]).buffer))
const buf = await fetchFromUpstream('test-path')
expect(buf.byteLength).toBe(3)
})
it('throws with status code on non-200', async () => {
vi.stubGlobal('fetch', async () => new Response(null, { status: 404 }))
await expect(fetchFromUpstream('bad')).rejects.toThrow('404')
})
it('respects <UPSTREAM>_PROXY_URL env var', async () => {
process.env.<UPSTREAM>_PROXY_URL = 'https://proxy.example.com'
let calledUrl = ''
vi.stubGlobal('fetch', async (url: string) => { calledUrl = url; return new Response(new ArrayBuffer(0)) })
await fetchFromUpstream('foo/bar').catch(() => {})
expect(calledUrl).toContain('proxy.example.com/foo/bar')
delete process.env.<UPSTREAM>_PROXY_URL
})
})
2. Smoke test (manual, run after each Zeabur deploy)
TOKEN='ipos_live_<your-token>'
# Happy path
curl -s -X POST https://ipdesk.ai/api/platform/v1/<route-name> \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"<field>": "<test-value>"}' \
-o /tmp/result.bin \
-w "HTTP=%{http_code} BYTES=%{size_download} CT=%{content_type}\n"
# For PDF responses: check magic bytes
head -c 4 /tmp/result.bin | xxd # should show: 25 50 44 46 = %PDF
# Error path — bad input should return 400 JSON, not 500 HTML
curl -s -X POST https://ipdesk.ai/api/platform/v1/<route-name> \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}' | jq .
Env var naming convention
| Type | Convention | Example |
|---|---|---|
| CF Worker egress proxy URL | <UPSTREAM>_PROXY_URL |
USPTO_PROXY_URL, GOOGLE_PATENTS_PROXY_URL |
| Third-party API key (held by IPOS) | <SERVICE>_API_KEY |
MISTRAL_API_KEY |
| Feature flag / toggle | <FEATURE>_ENABLED |
PRIOR_ART_PUSH_ENABLED |
Always set via Zeabur API mutation (gotchas #16) or the Zeabur dashboard — never commit production values into .env.local or the codebase.
After setting a new env var, trigger a redeploy: push a noop commit, or use zeabur-restart skill if no code change is needed.
Reference: Gotchas
Gotchas — Real Issues Hit During Deploys
Every entry here was debugged from a real failure. Each entry: symptom → root cause → fix. New one? Add it after fixing it.
1. PowerShell zip → "Malicious entry: 404\index.html"
Symptom: DevConsole upload shows Malicious entry: 404\index.html (note the backslash) in red.
Root cause: Windows PowerShell Compress-Archive writes ZIP entries with backslash separators (out\404\index.html). The IPOS extractor flags backslash entries as path-traversal attempts.
Fix: Use bash zip instead (Git Bash / WSL works on Windows).
# Wrong (PowerShell)
Compress-Archive -Path platform.json,out -DestinationPath app.zip # ❌ backslashes
# Right (bash zip)
cd apps/<slug>-app
zip -r ../<slug>-<version>.zip platform.json out/ # ✅ forward slashes
Verify with unzip -l app.zip — entries should look like out/_next/..., never out\_next\....
2. ZIP structure: out/ must be a sub-folder, not the root contents
Symptom: Build fails with npm error code ENOENT ... package.json — the static-adapter doesn't find out/ and falls back to running npm install (which then fails because no package.json is in the zip).
Root cause: lib/app-deploy/static-adapter.ts looks for <extractedDir>/out/. If you zipped the contents of out/ directly (so index.html is at zip root), there's no out/ folder and the adapter thinks it needs to build.
Fix: Zip with out/ as a sub-folder, alongside platform.json:
# Wrong: zipped contents of out/ at zip root
cd apps/<slug>-app/out && zip -r ../app.zip . # ❌
# Right: zip platform.json + out/ sibling
cd apps/<slug>-app && zip -r ../app.zip platform.json out/ # ✅
Verify with unzip -l app.zip | head — first lines should be platform.json and out/.
3. Same deploymentId returned on retry — old failed deploy, not new
Symptom: You uploaded a fixed zip but GET /api/ext/v1/deployments/<id> still shows the old failed status. The deploy endpoint returns the same deploymentId you got the first time.
Root cause: startDeployment is idempotent on (slug, version). Same version + new zip = same deploymentId returned (the original — including FAILED ones).
Fix: Bump version in platform.json before re-uploading. Even just 1.0.0 → 1.0.1. Build → zip → deploy → new deploymentId.
4. does not allow inserts via SDK (read-only table)
Symptom: App calls platform.db('org:<table>').insert(...) and gets back { error: { code: 'read_only', message: '<table> does not allow inserts via SDK' } }.
Root cause: lib/platform/org-table-config.ts doesn't list insertable_columns for that table → INSERT is disabled by default.
Fix: Edit lib/platform/org-table-config.ts, add insertable_columns: [...] to the table entry, including only columns the app legitimately needs to write. id, org_id, created_at are ALWAYS server-managed and never go in the whitelist. Then push to main and let Zeabur redeploy.
// Before
prior_art: {
columns: [...],
order_by: '...',
default_limit: 500,
},
// After
prior_art: {
columns: [...],
order_by: '...',
default_limit: 500,
insertable_columns: ['patent_number', 'country', 'title'],
},
5. column "..." is not in insertable_columns (column whitelist)
Symptom: 400 { code: 'bad_request', message: 'column "source" is not in insertable_columns for prior_art' }.
Root cause: App is trying to insert a column not in the whitelist. Either the column doesn't exist on the table at all, or it does exist but isn't safe to expose via SDK.
Fix: Either drop the column from your insert payload (if it's bogus — e.g. source: 'g1001' when prior_art has no source column), or add it to insertable_columns if legitimately needed. Check the migration for the table to see what columns actually exist.
6. SDK .insert(array) silently drops all but first row
Symptom: App passes an array to .insert(). Either the route 400s ("data must be object") or only one row lands in the DB.
Root cause: platform.db().insert() accepts a single Record<string, unknown>, not an array. The SDK type signature is (data: Record<string, unknown>): Promise<T>. TypeScript will let you as unknown as Record<string, unknown> cast an array through the signature; the route then receives an array as body.data and rejects it.
Fix: Loop:
for (const row of rows) {
try {
await platform.db('org:<table>').insert(row)
} catch (err) {
const msg = err instanceof Error ? err.message : String(err)
if (/duplicate key|unique constraint/i.test(msg)) continue // dedupe
throw err
}
}
For genuinely large batches needing transactional semantics, use the dedicated bulk endpoint (e.g. /api/platform/v1/prior-art/bulk) — but those are Bearer-authed, not shell-token; only useful from headless callers.
7. Cloudflare strips 502 response bodies
Symptom: Platform route is supposed to return JSON error envelope, but the client sees Cloudflare's "error code: 502" HTML page. Logs show the route ran and returned errJson('PROXY_ERROR', msg, 502) correctly.
Root cause: Cloudflare in front of the origin replaces 502 responses from origin with its own bad-gateway HTML page, throwing away your JSON body.
Fix: Don't return 502 from your origin code — return 503 instead.
// Wrong
return errJson('PROXY_ERROR', msg, 502) // ❌ CF eats the body
// Right
return errJson('PROXY_ERROR', msg, 503) // ✅ CF passes through
Same applies to other CF "interception" status codes if you find them. 503 is reliably proxied.
8. External API blocks Zeabur datacenter IP (403)
Symptom: Platform route returns 502 / 503 PROXY_ERROR upstream 403. Curling the external API from your local machine returns 200, but from Zeabur returns 403. User-Agent doesn't help.
Root cause: External service has IP-range blocking on cloud-provider ASNs. Common offenders: gov sites (USPTO), some CDN-protected APIs.
Fix: Run a Cloudflare Worker as an egress proxy. CF edge IPs are not blocked. Pattern:
// Worker (worker.js)
export default {
async fetch(req) {
const u = new URL(req.url)
const path = u.pathname.replace(/^\//, '')
const r = await fetch(`https://upstream.example.com/${path}`, {
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; YourApp/1.0)' },
cf: { cacheEverything: true, cacheTtl: 86400 },
})
return new Response(r.body, {
status: r.status,
headers: { 'Content-Type': r.headers.get('Content-Type') ?? 'application/octet-stream' },
})
},
}
Deploy: CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy. Set <UPSTREAM>_PROXY_URL=https://<name>.<sub>.workers.dev on Zeabur. Have your platform route call process.env.<UPSTREAM>_PROXY_URL when set, else fall back direct.
Live example: infra/cf-workers/uspto-proxy/ + lib/platform/uspto.ts (env var USPTO_PROXY_URL).
9. iframe download blocked: "Download is disallowed"
Symptom: Console shows Download is disallowed. The frame initiating or instantiating the download is sandboxed. Code does <a href={url} download>...</a> or URL.createObjectURL + a.click().
Root cause: components/os/Window.tsx iframe sandbox doesn't include allow-downloads.
Fix: Add allow-downloads to the sandbox attribute:
// Before
sandbox="allow-scripts allow-forms allow-same-origin allow-popups"
// After
sandbox="allow-scripts allow-forms allow-same-origin allow-popups allow-downloads"
10. _next/static/... 404s after deploy
Symptom: App opens, blank page or missing JS, console shows 404s like /_next/static/chunks/.... Browser network tab shows requests going to /_next/... not /api/app-static/<slug>/_next/....
Root cause: next.config.js is missing basePath and assetPrefix. Without them, Next.js generates absolute paths starting with /_next/, but the OS shell serves the app from /api/app-static/<slug>/.
Fix: Use the exact template from part2-implementation.md:
const platform = require('./platform.json');
const basePath = `/api/app-static/${platform.slug}`;
module.exports = {
output: 'export',
basePath,
assetPrefix: basePath,
trailingSlash: true,
images: { unoptimized: true },
generateBuildId: async () => null,
};
11. InsForge .single() throws on 0 rows
Symptom: Code does .single() on a query that should be optional, gets uncaught exception that escapes the route's try-catch.
Root cause: InsForge's .single() (unlike Postgrest's) throws when 0 rows match, not return null. The exception propagates past route handlers if not caught.
Fix: Use .limit(1) + array access:
// Wrong
const { data } = await admin.database.from('x').select('*').eq('id', id).single()
// Right
const { data: rows } = await admin.database.from('x').select('*').eq('id', id).limit(1)
const row = Array.isArray(rows) ? rows[0] : null
And always wrap in try-catch when failure is non-fatal.
12. Platform 500s with empty body (cold start)
Symptom: First request to /api/ext/v1/* after a quiet period returns 500 with no body. Subsequent requests work.
Root cause: Zeabur container cold start. Module-level imports (cloudClient() initialization, etc.) take 1-2s; first request can race the init.
Fix: Just retry. Don't conclude the API is broken from a single 500. Verify with curl /api/ext/v1/me after the first 500 — if it returns 200, the cold start is the cause.
13. pdfjs-dist (or other dep) build fails: "Cannot find module"
Symptom: Zeabur build fails with Type error: Cannot find module 'pdfjs-dist' or its corresponding type declarations. The package IS in apps/<slug>-app/package.json but root tsconfig.json is type-checking the file.
Root cause: Root tsconfig.json includes **/*.ts, which sweeps in apps/<slug>-app/src/lib/*.ts. Those files import pdfjs-dist which is only in the app's local package.json, not root.
Fix: Add to root tsconfig.json:
{
"exclude": ["apps/**/*", "node_modules", ".next"]
}
Or run npm install <dep> at root too. Excluding is cleaner.
14. npm install pdfjs-dist outputs error in PowerShell with tail
Symptom: npm install ... | tail -5 fails with tail not recognized.
Root cause: tail is not a PowerShell cmdlet.
Fix: Use Select-Object -Last 5 in PowerShell or pipe through bash. Annoying but stable: npm install foo 2>&1 | Select-Object -Last 5.
15. Cloudflare token verification before deploy
Before deploying a CF Worker, verify the token works:
curl -s "https://api.cloudflare.com/client/v4/user/tokens/verify" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"
# Expect: {"result":{"id":"...","status":"active"},"success":true,...}
If success:false → token wrong / expired. If 403 → token doesn't have Workers Scripts:Edit. Re-create with the "Edit Cloudflare Workers" template.
Then deploy:
cd infra/cf-workers/<name>
CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy
The output prints the https://<name>.<sub>.workers.dev URL. Copy that into Zeabur env vars.
16. Setting Zeabur env vars via API (no dashboard needed)
curl -s -X POST 'https://api.zeabur.com/graphql' \
-H "Authorization: Bearer $ZEABUR_API_KEY" \
-H 'Content-Type: application/json' \
-d "{\"query\":\"mutation { createEnvironmentVariable(serviceID: \\\"$SVC\\\", environmentID: \\\"$ENV\\\", key: \\\"$K\\\", value: \\\"$V\\\") { key value } }\"}"
The mutation is createEnvironmentVariable, not addEnvironmentVariable (will get a "did you mean" hint if wrong). Update existing var: updateEnvironmentVariable. After mutation, env vars apply to the next deploy — push a noop commit if you need to force a redeploy now.
18. Pre-push hook failure on first run (transient)
Symptom: git push fails with a hook error (e.g., npm run test:coverage:ci exits non-zero, or tsc --noEmit times out) even though the code is clean. Re-running the same push immediately succeeds.
Root cause: Pre-push hooks run in the same process as the git operation. On cold starts (first push after a long idle, or a fresh terminal), Node.js/TypeScript module resolution can race against the hook timer, especially on Windows with large repos.
Fix: Just retry. Push again with the same command. If the second attempt fails with an actual error message (not a timeout/ENOENT), then investigate.
Anti-pattern: --no-verify to skip the hook. Only use --no-verify when you know the error is a known flake AND you'll fix the root cause immediately after.
19. @google/generative-ai SchemaType: use enum, not string literals
Symptom: Zeabur build fails with TS2322: Type '"array"' is not assignable to type 'SchemaType'. Locally tsc --noEmit also fails. A @ts-expect-error suppressor added above the schema const triggers TS2578: Unused '@ts-expect-error' directive (because the error it was suppressing now appears on a different line).
Root cause: @google/generative-ai >= 0.24 requires SchemaType enum values (.ARRAY, .OBJECT, .STRING, .INTEGER, .BOOLEAN) instead of raw string literals. The SDK previously accepted strings via a looser union type; stricter TypeScript versions now reject them.
Fix:
// Wrong
import { GoogleGenerativeAI, type Schema } from '@google/generative-ai'
const schema: Schema = { type: 'array', items: { type: 'object', ... } } // ❌
// Right
import { GoogleGenerativeAI, SchemaType } from '@google/generative-ai'
import type { Schema } from '@google/generative-ai'
const schema: Schema = { type: SchemaType.ARRAY, items: { type: SchemaType.OBJECT, ... } } // ✅
Also fix the test mock: If you mock @google/generative-ai in vitest, the mock factory must export SchemaType — otherwise the route file imports SchemaType as undefined at module evaluation time, and SchemaType.ARRAY throws a runtime error that fails all tests:
vi.mock('@google/generative-ai', () => ({
SchemaType: { ARRAY: 'array', OBJECT: 'object', STRING: 'string', INTEGER: 'integer', BOOLEAN: 'boolean', NUMBER: 'number' },
GoogleGenerativeAI: vi.fn().mockImplementation(() => ({ ... })),
}))
20. ipos_apps.author_org_id must be set for org-owned apps
Symptom: App deploy call returns 403 or the app doesn't appear in the OS store for the intended org. The deploy went through but permissions fail at the org boundary.
Root cause: ipos_apps.author_org_id = null means the app is treated as a platform app (owned by IPOS itself). Org-level permission checks (org_write: ["app_files"] etc.) resolve differently for platform apps vs org-owned apps. An org-owned app dogfooding its own platform must have author_org_id set to the org's UUID.
Fix: After inserting the row in ipos_apps, update it:
UPDATE ipos_apps SET author_org_id = '<your-org-uuid>' WHERE slug = '<your-slug>';
Or include it in the INSERT:
await admin.database.from('ipos_apps').insert({
slug: 'your-slug',
name: 'Your App',
author_org_id: 'f0753aa5-dcbe-4310-a65f-96e80729d3ef', // ← required for org-owned apps
...
})
To find your org UUID: SELECT id FROM organizations WHERE name = 'Your Org Name' via InsForge admin or zeabur-service-exec.
21. Direct fetch() from iframe to platform routes needs credentials: 'include'
Symptom: Platform route returns 401 when called from the app. The app is running in the OS iframe. buildRequestContext can't find a userId.
Root cause: The platform-js SDK's internal methods (like platform.db(), platform.appFileUpload()) use the shell-injected auth token automatically. But if the app makes a direct fetch() call to a platform route (e.g. /api/platform/v1/scan-doc/analyze), the browser won't include the session cookie unless explicitly told to.
The app and its platform routes are on the same origin (ipdesk.ai), so credentials: 'include' works — the InsForge session cookie is present and buildRequestContext can read it.
Fix: Always include credentials: 'include' when calling platform routes directly from the zip app:
const res = await fetch('/api/platform/v1/<route>', {
method: 'POST',
credentials: 'include', // ← required for cookie-session auth
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ ... }),
})
This only applies to direct fetch() calls. If you call a route through platform.db() or platform.appFileUpload(), auth is handled automatically.
17. Patent number normalization: USPTO uses bare integer
Symptom: Calling USPTO with 09943073 (8-digit padded) → 404. Calling with 9943073 (7-digit bare) → 200.
Root cause: image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/<n> expects bare integer with no leading zeros, no commas, no US prefix.
Fix: Normalize before calling:
export function padPatentNumber(raw: string): string {
return raw.replace(/^US/i, '').replace(/,/g, '').trim().replace(/^0+/, '') || '0'
}
US09943073 → 9943073 → 200.
Reference: Flow Compliance Report
Flow Compliance Report (給客戶看的 HTML 交付文件)
目的:當 Part 2 實作完成後,產出一份 HTML 報告交給客戶。讓非工程師的客戶能看懂自己原本的工具/流程,被怎麼搬到 IPOS、哪裡符合 IPOS 規則、哪裡需要調整。
讀者假設:客戶會 Excel、會用瀏覽器,但不一定懂什麼是 API、bucket、postMessage。所以全篇用白話:能講「按鈕」就不要講「control」、能講「網路請求」就不要講「HTTP route」。
何時產出
- 在 Part 2「Build, Package, Deploy」之前寫好(這樣 deploy 失敗的話還能改 spec)
- 也在 deploy 成功後更新一次(標上實際的 deploymentId 和版本)
產出步驟
- 問客戶要放哪:每個客戶的目錄結構不一樣,先問:
「這份 HTML 流程說明書要放在哪?預設是
docs/<slug>-flow-compliance.html,可以嗎?或你想放別的位置?」 - 確認後再寫檔。不要進 zip(這是文件,不是 app 的一部分)。
- 檔名建議:
<slug>-flow-compliance.html(例:g1001-flow-compliance.html)
三大內容(A + B + C,用客戶看得懂的話寫)
A. 你原本的流程(Before)
用流程圖或編號步驟描述客戶現在怎麼做事。例如:
- 「打開 Excel 檔 → 貼上專利號 → 跑巨集 → 等 5 分鐘 → 結果存 D:\output\」
- 「打開 Streamlit → 上傳 PDF → 點 OCR → 下載 CSV」
寫法提示:用客戶自己會說的話,不要翻成技術語言。
B. 搬到 IPOS 後的新流程(After)
對應同一件事在 IPOS 上怎麼做:
- 在哪個視窗開?
- 點哪個按鈕?
- 資料存到哪裡?(IPOS 雲端 / 客戶自己下載 / OS Agent 幫忙做)
- 哪些步驟變成「不用做」(因為 IPOS 自動處理了)
C. 符合 / 不符合 對照(Compliance Checklist)
用三欄表格:項目 | 狀態 | 說明(白話)
必須涵蓋這些檢查項(用客戶能懂的話包裝):
| 技術檢查項 | 翻成白話 |
|---|---|
| 不能有客戶自管伺服器 | 「整個工具不用你自己開機/維護伺服器」 |
| 沒有 DB 帳密在程式碼 | 「裡面沒有寫死任何資料庫密碼」 |
| 用到的 platform routes | 「IPOS 幫你跑的雲端服務(OCR / 翻譯 / 外部查詢)」 |
| Agent Control Surface 完整 | 「OS Agent(小幫手)能幫你按下每一個按鈕、看到每個欄位」 |
| 沒有隱藏的 UI 狀態 | 「沒有 Agent 看不到的隱藏資料」 |
| 每個 redeploy 有 bump version | 「每次更新都會升版號,不會跟舊版混淆」 |
每一列三種狀態:
- ✅ 符合 — 一句話說明怎麼做到的
- ⚠️ 部分符合 — 為什麼、哪裡需要客戶配合(例:「需要你提供 OCR 的範例 PDF 來校準」)
- ❌ 不符合 — 為什麼、IPOS 還是客戶要做哪件事才能補上
D. 不符合項的補救計畫(Remediation Plan)
如果有任何 ⚠️ / ❌,逐項列出:
- 是什麼問題(白話)
- 誰要處理(客戶 / IPOS / 一起)
- 預計怎麼處理(例:「IPOS 會新增一個 OCR 雲端服務,預計 X 週」)
- 在補上之前,這個工具能用嗎?哪裡會卡住?
E. Agent 能做什麼(給客戶看的能力清單)
把 platform.json.agent_actions 翻成白話,例如:
- ❌ 不要寫:
set_patent_number(value: string) - ✅ 要寫:「填入專利號碼」
列出 Agent 能讀到什麼(state)和能做什麼(actions)。讓客戶知道「我可以叫 OS 小幫手做這些事」。
F. 用到的 IPOS 雲端服務(Platform Routes)
如果這個 app 有用到 IPOS 提供的 platform route(OCR、翻譯、外部 API proxy 等),在這裡列:
- 服務名稱(白話)
- 做什麼用
- IPOS 維護,客戶不用管
HTML 模板(直接拿來改)
<!DOCTYPE html>
<html lang="zh-TW">
<head>
<meta charset="UTF-8">
<title>{{App 名稱}} — IPOS 流程對照與規則檢查</title>
<style>
body { font-family: -apple-system, "Segoe UI", "Microsoft JhengHei", sans-serif;
max-width: 880px; margin: 40px auto; padding: 0 20px; color: #222;
line-height: 1.7; }
h1 { border-bottom: 3px solid #2563eb; padding-bottom: 8px; }
h2 { color: #1e40af; margin-top: 40px; }
h3 { color: #374151; }
.meta { background: #f3f4f6; padding: 12px 16px; border-radius: 8px;
font-size: 14px; color: #555; }
table { width: 100%; border-collapse: collapse; margin: 16px 0; }
th, td { border: 1px solid #d1d5db; padding: 10px 12px; text-align: left;
vertical-align: top; }
th { background: #f9fafb; }
.ok { color: #15803d; font-weight: 600; }
.warn { color: #b45309; font-weight: 600; }
.fail { color: #b91c1c; font-weight: 600; }
.flow { background: #eff6ff; padding: 16px 20px; border-left: 4px solid #2563eb;
border-radius: 4px; margin: 12px 0; }
.flow ol { margin: 0; padding-left: 20px; }
code { background: #f3f4f6; padding: 2px 6px; border-radius: 3px;
font-size: 0.9em; }
</style>
</head>
<body>
<h1>{{App 名稱}} — 流程對照與規則檢查</h1>
<div class="meta">
<strong>客戶:</strong>{{客戶名}}<br>
<strong>App slug:</strong><code>{{slug}}</code><br>
<strong>版本:</strong>{{version}}<br>
<strong>產出日期:</strong>{{YYYY-MM-DD}}<br>
<strong>狀態:</strong>{{草稿 / 已上線 deploymentId=xxx}}
</div>
<h2>1. 你原本是怎麼做這件事的</h2>
<div class="flow">
<ol>
<li>{{步驟 1,用客戶自己的話}}</li>
<li>{{步驟 2}}</li>
<li>{{...}}</li>
</ol>
</div>
<p>{{原本流程的痛點:例如要裝 Python、要等很久、檔案散落、換電腦就不能用 ...}}</p>
<h2>2. 搬到 IPOS 之後變這樣</h2>
<div class="flow">
<ol>
<li>{{步驟 1,例:在 IPOS 桌面打開 g1001 視窗}}</li>
<li>{{步驟 2}}</li>
<li>{{...}}</li>
</ol>
</div>
<p>{{改善點:不用裝環境、瀏覽器就能跑、結果自動存到 IPOS 雲端、可以叫 Agent 幫你做 ...}}</p>
<h2>3. IPOS 規則檢查(你的 app 符合嗎?)</h2>
<table>
<thead>
<tr><th>檢查項目</th><th>狀態</th><th>說明</th></tr>
</thead>
<tbody>
<tr>
<td>整個工具不用自己開伺服器</td>
<td><span class="ok">✅ 符合</span></td>
<td>{{說明}}</td>
</tr>
<tr>
<td>程式碼裡沒有任何資料庫密碼</td>
<td><span class="ok">✅ 符合</span></td>
<td>{{說明}}</td>
</tr>
<tr>
<td>OS 小幫手(Agent)看得到所有欄位、按得了所有按鈕</td>
<td><span class="ok">✅ 符合</span></td>
<td>{{說明}}</td>
</tr>
<tr>
<td>每次更新會升版號</td>
<td><span class="ok">✅ 符合</span></td>
<td>目前版本 {{version}}</td>
</tr>
<tr>
<td>{{範例:OCR 功能}}</td>
<td><span class="warn">⚠️ 部分符合</span></td>
<td>{{IPOS 還在做這個雲端服務,預計 X 週後完成}}</td>
</tr>
</tbody>
</table>
<h2>4. 還沒符合的部分要怎麼處理</h2>
<table>
<thead>
<tr><th>問題</th><th>誰處理</th><th>怎麼處理</th><th>在這之前能用嗎?</th></tr>
</thead>
<tbody>
<tr>
<td>{{問題白話描述}}</td>
<td>{{IPOS / 你 / 一起}}</td>
<td>{{做法}}</td>
<td>{{能用,只是 X 功能會跳提示 / 不能用,要等}}</td>
</tr>
</tbody>
</table>
<h2>5. OS 小幫手(Agent)能幫你做什麼</h2>
<p>你可以對 OS 小幫手說這些話,它會幫你操作這個 app:</p>
<ul>
<li>「{{範例:把專利號碼填成 US12345678}}」</li>
<li>「{{範例:開始下載}}」</li>
<li>「{{範例:取消 / 重設}}」</li>
</ul>
<p>它也看得到這些東西(不用你截圖給它):</p>
<ul>
<li>{{目前填了什麼}}</li>
<li>{{現在跑到第幾步}}</li>
<li>{{有沒有錯誤訊息}}</li>
</ul>
<h2>6. 用到哪些 IPOS 雲端服務</h2>
<table>
<thead>
<tr><th>服務</th><th>做什麼用</th><th>誰維護</th></tr>
</thead>
<tbody>
<tr>
<td>{{範例:OCR 文字辨識}}</td>
<td>把 PDF 裡的圖片轉成可搜尋的文字</td>
<td>IPOS(你不用管)</td>
</tr>
</tbody>
</table>
<h2>7. 結論</h2>
<p>{{一段話總結:這個 app 是否完全符合 IPOS 規則、可以正式使用、還是有等待事項。}}</p>
</body>
</html>
寫作檢查(產出前自己看一遍)
- [ ] 全篇沒有「API / endpoint / route / postMessage / iframe / manifest / bucket」這種詞(除非有解釋)
- [ ] 每個 ⚠️ / ❌ 都有對應的補救計畫(誰、怎麼做、什麼時候)
- [ ] Agent actions 全部翻成「對 OS 小幫手說的話」
- [ ] 客戶讀完知道:(1) 我可以開始用嗎 (2) 哪裡還在等 (3) 我自己要不要做什麼
- [ ] 如果 deploy 已成功,最上面的「狀態」有填 deploymentId 和版本
Worked example: g1001 (Pattern A/B — external API proxy)
g1001 Walkthrough — Complete Worked Example
g1001 is the reference implementation for ipos-sdk-converter. Originally a Streamlit Python app that extracts US patent numbers from USPTO PTO-892 Office Actions and downloads each patent as a PDF; converted to an IPOS SDK app with full feature parity, zero customer servers, and shipped end-to-end in apps/g1001-app/.
This is the doc to read when you want to see what a successful conversion looks like.
Original app summary
Stack: Python 3, Streamlit, PyMuPDF (PDF extract), Mistral OCR, requests (USPTO download).
Features:
- Upload a PDF (PTO-892 form)
- Extract text with PyMuPDF
- Detect scanned PDFs and run Mistral OCR fallback
- Filter the "U.S. PATENT DOCUMENTS" section
- Extract all US patent numbers via regex
- Show editable list (user adds/removes patents)
- Download all patent PDFs from USPTO into a ZIP
- Push patent numbers to the IPOS database
Customer pain: Each user needed local Python install + dependency dance every time. Different team members had different versions. Couldn't share results.
Part 1 — SDK Spec Sheet (filled by non-technical owner)
| Feature | Needs server? | Bucket | Resource |
|---|---|---|---|
| Upload PDF | No | zip | <input type="file"> |
| Extract text from PDF | No | zip | pdfjs-dist |
| Detect scanned PDF | No | zip | pdfjs-dist (text length < threshold) |
| OCR scanned PDF | Yes (API key + compute) | platform route | /api/platform/v1/ocr |
| Filter PTO-892 section | No | zip | JS regex |
| Extract patent numbers | No | zip | JS regex |
| Display + edit list | No | zip | React |
| Download USPTO PDFs | Yes (CORS + IP block) | platform route | /api/platform/v1/uspto-proxy |
| Package as ZIP | No | zip | jszip |
| Push patents to org DB | IPOS data | platform-js SDK | platform.db('org:prior_art').insert() |
New routes that needed building: /api/platform/v1/ocr and /api/platform/v1/uspto-proxy.
Part 2 — Implementation outcomes
Repo structure
apps/g1001-app/
├── platform.json # slug: g1001, name: G1001 專利擷取
├── package.json # next 14.2.33, react 18, pdfjs-dist 4.10, jszip 3.10
├── next.config.js # output: 'export', basePath, assetPrefix
├── pages/
│ ├── _app.tsx
│ └── index.tsx # dynamic(() => import('../src/G1001App'), { ssr: false })
└── src/
├── G1001App.tsx # step router 1 → 2 → 2.5 → 3
├── vendor/platform-js.ts # SDK copy
├── steps/
│ ├── Step1Upload.tsx # file input
│ ├── Step2Extract.tsx # pdfjs + OCR fallback via /ocr
│ ├── Step25EditList.tsx # editable patent list
│ └── Step3Download.tsx # USPTO proxy + JSZip + platform DB push + manual fallback link
└── lib/
├── normalize-patent.ts # strip US prefix, commas, leading zeros
├── pto892-parser.ts # regex: filter section + extract numbers
└── pdf-extract.ts # pdfjs-dist wrapper, isScanned detection
Platform routes built
app/api/platform/v1/ocr/route.ts— accepts base64 PDF, calls Mistral OCR, returns{ data: { text } }. API key held in env, never reaches the browser.app/api/platform/v1/uspto-proxy/route.ts— acceptspatent_number, normalizes viapadPatentNumber, callsdownloadPatentPdfinlib/platform/uspto.ts, streams bytes back asapplication/pdf. Returns 503 (NOT 502) on failure (Cloudflare strips 502 bodies).
CF Worker (added during implementation)
USPTO blocks Zeabur's datacenter IPs. We discovered this when 9/9 patents 502'd in production despite working locally. Fix:
infra/cf-workers/uspto-proxy/— 30-line Cloudflare Worker that fetches USPTO from CF's edge IPs (not blocked) and caches 1 day.- Set
USPTO_PROXY_URL=https://uspto-proxy.<sub>.workers.devon Zeabur. lib/platform/uspto.tsreads the env var; uses Worker if set, falls back direct otherwise.
This is now a generic pattern — see gotchas.md → "External API blocks Zeabur datacenter IP".
Browser-side fallback
Even with the CF Worker, Step3Download.tsx keeps a fallback: if proxy returns 503, render a "直接下載 ↗" link to USPTO that the user clicks (their browser IP isn't blocked, and CORS doesn't apply to navigation). Patent numbers still push to IPOS DB even when PDFs aren't fetched server-side. This is the "graceful degradation when CF Worker URL is unset" path.
Key porting decisions (Python → TypeScript)
| Original | Replacement | Why |
|---|---|---|
PyMuPDF (fitz) |
pdfjs-dist |
C extension; can't run in browser. Same conceptual API: load PDF, iterate pages, extract text items. |
| Mistral OCR via Python SDK | /api/platform/v1/ocr |
Holds API key server-side. App sends base64 PDF, gets text back. |
requests.get('https://image-ppubs.uspto.gov/...') |
/api/platform/v1/uspto-proxy → CF Worker → USPTO |
Two layers: browser CORS (no headers) + Zeabur IP block (datacenter ASN). Both solved by routing through CF edge. |
ipdesk_sdk.py.push(rows) (was never wired up) |
platform.db('org:prior_art').insert(row) looped per row |
SDK takes single row; loop + /duplicate key/i dedupe on retries. |
| Streamlit step state | React useState + step router in G1001App.tsx |
Same UX, runs in browser instead of Streamlit server. |
Bugs encountered (and the gotchas they spawned)
Every bug below caused at least one user-visible failure. All are now in references/gotchas.md.
| Bug | Symptom | Fix in this app |
|---|---|---|
| PowerShell zip | "Malicious entry: 404\index.html" | Use bash zip -r ../app.zip platform.json out/ |
| Wrong zip structure | Build tries npm install, fails ENOENT |
Zip with out/ as sub-folder, not root |
| Same deploymentId returned | New zip silently re-uses old failed deploy | Bump version in platform.json |
prior_art read-only |
does not allow inserts via SDK |
Add insertable_columns to lib/platform/org-table-config.ts |
.insert(array) cast |
TS lets it compile, runtime route 400s | Loop one row at a time |
source: 'g1001' extra column |
column "source" not in insertable_columns |
prior_art has no source column — drop it |
| 502 from proxy | Cloudflare replaces body with HTML page | Return 503 from origin |
| 403 from USPTO | Datacenter IP blocked | CF Worker proxy + USPTO_PROXY_URL env |
| iframe download blocked | "Download is disallowed" | Add allow-downloads to sandbox in components/os/Window.tsx |
pdfjs-dist not found |
Root tsconfig sweeps in app/.ts files | npm install pdfjs-dist at root OR exclude apps/** in tsconfig |
What customers learn from this example
- Any Python app can convert — even ones with C-extension deps (PyMuPDF), API keys (Mistral), and CORS+IP-blocked external APIs (USPTO).
- Server-side needs map cleanly to platform routes. No customer ever runs a Python server.
- The diagnosis sheet front-loads decisions. If they had skipped Part 1 and dived into code, they'd have built a Lambda for USPTO downloads — wrong solution.
- CF Workers are a generic escape hatch for IP-blocked APIs. First time we hit it, we fixed it. Now it's a pattern.
- Each gotcha encountered → permanent skill knowledge. The next app conversion won't re-hit any of these.
Production state (as of last verified)
| Item | Value |
|---|---|
| g1001 version | 1.0.5 |
| Deploy ID | f7053709-636a-428e-990e-0005a7f9e56a (status: live) |
| Runtime URL | /api/app-static/g1001/ |
| CF Worker URL | https://uspto-proxy.sku772003.workers.dev |
| Test patent | 9943073 → 200 OK, 332652 bytes, %PDF magic ✅ |
| End-to-end | 9/9 patents downloaded, ZIP delivered, 9 rows in prior_art |
To redeploy this app from scratch: see part2-implementation.md → "Build, Package, Deploy".
Worked example: scan-doc (Pattern E — AI analysis + file storage)
scan-doc Walkthrough — AI Document Analysis + File Storage
scan-doc (掃描文書分割) is the second reference implementation for ipos-sdk-converter. It converts a Streamlit-based Python tool (scan-doc-organizer) that uses Gemini Vision to identify legal document boundaries in combined scanned PDFs, then splits them and saves each document as a separate file to 檔案總管.
This walkthrough demonstrates Pattern E: AI analysis with structured output + platform.appFileUpload() — the pattern to use whenever an app needs to call an AI Vision API and store output files in the org's file manager.
Original app summary
Stack: Python 3, Streamlit, google-genai Python SDK, pdf2image, pypdf, Pillow.
Features:
- Upload a combined PDF (scanned legal documents, multiple per file)
- Send PDF to Gemini Vision for analysis — identify each document's page range, type, case number, court, suggested filename
- Show editable table of identified documents (user can adjust page ranges / filenames)
- Split the PDF into per-document files using explicit page lists
- Download the split files as a ZIP
Customer pain: Tool ran only on the developer's laptop. Required Python + poppler + API key setup. Output ZIPs were ephemeral — no org-level storage, no audit trail.
Part 1 — SDK Spec Sheet
| Feature | Needs server? | Bucket | Resource |
|---|---|---|---|
| Upload PDF (max 10 MB) | No | zip | <input type="file"> |
| Validate PDF format | No | zip | Magic bytes check %PDF- in JS |
| Send PDF to Gemini Vision for analysis | Yes (platform API key) | platform route | /api/platform/v1/scan-doc/analyze |
| Show editable document table | No | zip | React |
| Validate page ranges (start ≤ end) | No | zip | JS pre-flight check |
| Split PDF into per-document files | Yes (compute) | platform route | /api/platform/v1/scan-doc/split |
| Save split PDFs to 檔案總管 | IPOS data | platform-js SDK | platform.appFileUpload() |
New routes built: /api/platform/v1/scan-doc/analyze and /api/platform/v1/scan-doc/split.
Key decisions vs g1001:
- g1001: external API calls (USPTO) → needs CF Worker for IP block
- scan-doc: AI model call (Gemini) → ideally uses AI Actions system (
executeAction); current implementation usesGEMINI_API_KEYenv var directly as a shortcut (tech debt — bypasses AI Actionsis_enabledgate and usage logging). See tech debt note inroute-design-patterns.mdPattern E.
Part 2 — Implementation outcomes
Repo structure
apps/scan-doc-app/
├── platform.json # slug: scan-doc, permissions: { org_write: ["app_files"] }
├── package.json # next 14.2.33, react 18, pdfjs-dist 4.10
├── next.config.js # output: 'export', basePath, assetPrefix
├── pages/
│ ├── _app.tsx
│ └── index.tsx # dynamic(() => import('../src/ScanDocApp'), { ssr: false })
└── src/
├── ScanDocApp.tsx # Platform.connect(), state machine, agent actions
├── vendor/platform-js.ts # SDK copy (same as g1001)
└── steps/
├── Step1Upload.tsx # file picker → validate → call analyze route
├── Step2Analyzing.tsx # spinner while route runs
└── Step3Results.tsx # editable table → call split → appFileUpload loop
Platform routes built
app/api/platform/v1/scan-doc/analyze/route.ts— validates PDF (magic bytes + pdf-lib page count), sends asinlineDatato Gemini 2.5 Flash withresponseSchemausingSchemaTypeenum. Returns{ data: { documents, page_count } }.app/api/platform/v1/scan-doc/split/route.ts— pure pdf-lib splitting; no external calls. Accepts explicitpages: number[]per document for flexible page reordering. Returns{ data: { files: [{ filename, pdf_b64 }] } }.
What the AI actually does
Gemini 2.5 Flash Vision analyzes the PDF inline (no rendering needed — Gemini accepts inlineData: { mimeType: 'application/pdf', data: base64 }). The prompt instructs it to identify each independent legal document within the combined scan by looking for:
- Receipt stamps (收文章) — each stamp indicates a new document
- Different case numbers (案號) — different numbers = different documents
- Court identifiers, document type, dates
It returns a structured JSON array (enforced via responseSchema) with page boundaries and suggested filenames. The user can correct any misidentifications in the editable table before splitting.
File storage flow
After splitting, the app loops platform.appFileUpload() per document:
for (const file of splitResult.files) {
await platform.appFileUpload({
name: `${file.filename}.pdf`,
mime_type: 'application/pdf',
content_b64: file.pdf_b64,
metadata: { source: 'scan-doc', original_file: result.file_name },
})
}
Files land in app_files (org-scoped) and appear immediately in 檔案總管. platform.json declares permissions: { org_write: ["app_files"] } to authorize this.
Bugs encountered (and gotchas they spawned)
| Bug | Symptom | Fix |
|---|---|---|
SchemaType string literals in @google/generative-ai |
TS2322: Type '"array"' not assignable to 'SchemaType' → Zeabur build fails |
Use SchemaType.ARRAY enum. See gotchas #19 |
Test mock missing SchemaType |
All 6 analyze tests fail: SchemaType is undefined at module eval |
Add SchemaType: { ARRAY: 'array', ... } to vi.mock('@google/generative-ai', () => ({ ... })). See gotchas #19 |
author_org_id = null in ipos_apps |
Deploy 403 / wrong permission scope | Set author_org_id to org UUID. See gotchas #20 |
| PowerShell zip backslash entries | "Malicious entry: 404\index.html" on upload | Use bash zip -r. See gotchas #1 |
| Direct fetch needs cookie auth | Platform route returns 401 from iframe | Add credentials: 'include' to every direct fetch() to platform routes. See gotchas #21 |
| Page range inversion (start > end) | Silent empty output document | Pre-flight JS validation before calling split route |
doc.pages not validated server-side |
Empty page array slips through, pdf-lib produces zero-page PDF | Validate Array.isArray(doc.pages) && doc.pages.length > 0 in split route |
Buffer.from() doesn't throw on bad base64 |
Invalid base64 treated as garbage bytes | Check %PDF- magic bytes after decode |
pdf-lib load() fails on some real-world PDFs |
pageCount is wrong |
Wrap in try-catch, fall back to 1; page count is informational only |
Key porting decisions (Python → TypeScript)
| Original | Replacement | Why |
|---|---|---|
google-genai Python SDK + pdf2image for rendering |
/api/platform/v1/scan-doc/analyze with inlineData |
Gemini API accepts PDF inline — no page rendering needed. API key kept server-side. |
pypdf.PdfReader / PdfWriter for splitting |
/api/platform/v1/scan-doc/split with pdf-lib |
pdf-lib is pure JS, works in Node.js without native deps. Supports page reordering via explicit page list. |
| Streamlit editable table + ZIP download | React editable table + platform.appFileUpload() loop |
No ZIP download needed — files go directly into org's 檔案總管. User gets permanent storage instead of a one-time download. |
| Local file saved on laptop | platform.appFileUpload() → app_files |
All processed docs are now org-accessible and auditable. |
Production state
| Item | Value |
|---|---|
| App slug | scan-doc |
| Version | 1.0.0 |
| Deploy ID | da93f5db-1544-4bee-a1ba-0fcd4d7cb062 (status: live) |
| Runtime URL | /api/app-static/scan-doc/ |
| Test cases | 10/10 (6 analyze + 4 split) |
| AI model | Gemini 2.5 Flash |
| Env var required | GEMINI_API_KEY on Zeabur |
To redeploy: see part2-implementation.md → "Build, Package, Deploy". Bump version in platform.json before re-uploading.
Reference: Session Persistence (opt-in cross-device state)
Session Persistence — Opt-in Contract
Audience: developers shipping an IPOS SDK app who want their app to remember its in-flight state across devices.
Status: opt-in. Apps that do not opt in still get cross-device window geometry (position/size/z-order) — the OS handles that without your help.
When to opt in
Opt in when your app has user-meaningful in-app state that should survive closing the browser on machine A and re-opening on machine B:
- Form drafts (subject + body of an unfinished email, a partially-filled wizard)
- Current view / tab / filter selection
- Pinned items or local sort order
- Anything the user would consider "where I left off"
Do not opt in for:
- Auth tokens, refresh tokens, session keys (regenerate from cookies)
- Live WebSocket connections, in-flight uploads (re-establish on mount)
- Large blobs (limit is 64 KB JSON per window — uploads or images go in storage, not session state)
- Data already in the user's database (case lists, email cache — re-fetch)
How to opt in
1. Manifest
Add to platform.json (or v2 bundle manifest):
{ "session_persistence": true }
2. Implement two control-surface commands
The OS speaks two standard postMessage shapes for session persistence:
OS → app (on mount, if a prior snapshot exists):
{ "type": "ipos.session.restore",
"windowId": "...",
"payload": { "state": <your state>, "schemaVersion": 3 } }
App → OS (when persistable state changes):
{ "kind": "ipos:request",
"type": "session.snapshot",
"id": <request id>,
"payload": { "state": <your state>, "schemaVersion": 3 } }
The OS debounces incoming snapshots: repeated messages within ~1 second coalesce
into a single network write. The OS acknowledges each snapshot with a normal
ipos:response.
3. Trigger a snapshot when state changes
If your SDK helper exposes ipos.notifySnapshot(), call it on every relevant
state change. Otherwise post manually:
window.parent.postMessage(
{ kind: 'ipos:request', type: 'session.snapshot', id: Date.now(),
payload: { state: getCurrentDraft(), schemaVersion: 3 } },
'*',
);
Size, frequency, and schema rules
| Rule | Limit |
|---|---|
state JSON size (after stringify) |
64 KB |
| Snapshot frequency | OS debounces to 1 per second per window |
schemaVersion |
Plain integer. Increase when state shape changes. |
Oversized payloads return 413 from the server; your app receives no acknowledgement and the previous good snapshot remains the persisted state.
Schema migration
Snapshots can outlive code deploys. When you change the state shape, bump
schemaVersion. Your restore handler decides:
- If
schemaVersion === current→ apply. - If older → migrate or discard.
- If newer (rolled back deploy) → discard.
The OS never inspects the state JSON, so migration is entirely your responsibility.
What about multiple devices open at once?
state is overwritten on every snapshot — last write wins. The expected pattern
is that the user is actively typing in one device at a time. Concurrent edits in
two browsers will lose the older one. This is acceptable because (a) the
user-visible window has only one active editor at a time, (b) the value of
"survives a browser close" is high and the value of "merges concurrent edits in
two browsers" is low.
Verification checklist
When adding session persistence to an app, verify:
- [ ] Manifest declares
session_persistence: true. - [ ] App emits
session.snapshotmessages with JSON ≤ 64 KB. - [ ] App handles
ipos.session.restorefor current and prior schema versions. - [ ] No auth tokens, refresh tokens, or live connection IDs appear in
state. - [ ] Sensitive PII filtered out of
state(the value is stored server-side under the user's row). - [ ] On a fresh device + same user, opening the app shows the last-saved state.
- [ ] If the schema is bumped, old snapshots are migrated or discarded
gracefully (no thrown errors in
restore).
Authenticated calls to /api/platform/v1/* — shell-token contract (validator R002)
Rule: every fetch from a static IPOS app to /api/platform/v1/* MUST attach the shell token, otherwise the platform route returns 401 unauthenticated with no session or bearer token.
How the shell delivers the token: when the OS shell opens your app's iframe, it appends ?shell_token=<jwt> to the URL. You read it from window.location.search, then add the x-ipos-shell-token header on every platform-route call.
Canonical pattern (mirrored from apps/os-map-app/src/useTopology.ts):
// lib/api.ts
function readShellToken(): string | null {
if (typeof window === 'undefined') return null;
return new URLSearchParams(window.location.search).get('shell_token');
}
async function callPlatform<T>(url: string): Promise<T> {
const headers: Record<string, string> = { accept: 'application/json' };
const token = readShellToken();
if (token) headers['x-ipos-shell-token'] = token;
const r = await fetch(url, { headers });
return r.json();
}
For SSE endpoints (EventSource cannot set headers), use ?shell_token=… as a query param — the platform routes that support SSE also accept query-string fallback (e.g. /api/platform/v1/topology/events?shell_token=…).
Validator enforcement (R002, added 2026-05-17):
@ipdesk/ipos-validator ships R002 in defaultRules. It scans lib/, src/, components/, pages/, app/ for files that reference '/api/platform/v1/' but do NOT contain any of shell_token / x-ipos-shell-token / x-ipos-shell and emits a warning diagnostic. This rule runs at cli, pack, upload, and build layers. The CLI ipos pack will surface the warning before you ship; the server-side upload also runs it as a defense-in-depth.
If your app uses a shared helper imported from another file (false positive), make sure the file that actually does the fetch also imports / references a token marker so R002 sees it.
Past incident: ipdesk-legal v1.1.0 shipped without this — every search returned no session or bearer token in the UI. v1.1.1 patched lib/api.ts to include readShellToken(). R002 now catches this class of bug pre-ship.
Available AI Agents (auto-generated)
This section is generated at request time from the live
ai_actionstable. Customer agents reading this catalog should reuse an existing slug when possible rather than declaring a new one.
| slug | name | description | input_type | output_shape | example_use_case |
|---|---|---|---|---|---|
| classify | 郵件自動分類 | 自動分類郵件並套用標籤 | text | text | — |
| composer_agent | Composer Agent | Agent App Composer - converts natural language to Workspace DAG | text | text | — |
| extract | 資料移轉助手 | 解析上傳檔案(Excel/Word/PDF),自動映射欄位到系統 schema | text | — | |
| legacy_ocr | 通用 PDF OCR (legacy) | Replaces the legacy v1 /api/platform/v1/ocr direct fetch to api.mistral.ai. Routes through the centralized ai_providers/ai_actions framework so operators can swap models without code changes. | image | text | — |
| legal-search | 台灣法律查詢 | 使用 IPDesk Legal 內建工具搜尋台灣全國法規、司法院判決、大法官解釋與憲法法庭裁判,並依條文/JID/解釋字號引用回答。 | text | text | — |
| os_copilot | OS Copilot Agent | IPOS 桌面 AI 助理 — 透過 tool-calling 操作已安裝 app、執行 skill。 | text | text | — |
| scan_doc_classify | 掃描文書辨識 (Layer A) | Tier 1/2 - Gemini 2.5 Flash via OR for legal-doc classification | text | text | — |
| scan_doc_classify_pro | 掃描文書辨識 Pro (Layer A) | Tier 3 - Gemini 2.5 Pro via OR (escalates when flash leaves uncovered pages) | text | text | — |
| scan_doc_layer_b_ask | 掃描文書 Layer B 重判讀 | Sub-region high-DPI image ask | image | text | — |
| scan_doc_ocr_fallback_1 | 掃描文書 OCR 後備一 | Tier 2 �X Qwen2.5-VL 72B | image | text | — |
| scan_doc_ocr_fallback_2 | 掃描文書 OCR 後備二 | Tier 3 �X Gemini 2.5 Flash | image | text | — |
| scan_doc_ocr_primary | 掃描文書 OCR 主力 | Tier 1 �X Qianfan-OCR-Fast (free) | image | text | — |
Available Platform Routes (auto-generated)
| id | label | serviceId |
|---|---|---|
| _log | _log | svc:_log |
| app-files | App Files | svc:app-files |
| capabilities | Capabilities | svc:capabilities |
| chains | Chains | svc:chains |
| classify | Classify | svc:classify |
| docx | Docx | svc:docx |
| svc:email | ||
| fetch | Fetch | svc:fetch |
| g1002 | G1002 | svc:g1002 |
| google-patents-pdf | Google Patents Pdf | svc:google-patents-pdf |
| legal | Legal | svc:legal |
| me | Me | svc:me |
| notify | Notify | svc:notify |
| ocr | Ocr | svc:ocr |
| org | Org | svc:org |
| svc:pdf | ||
| public | Public | svc:public |
| queue | Queue | svc:queue |
| scan-doc | Scan Doc | svc:scan-doc |
| search | Search | svc:search |
| storage | Storage | svc:storage |
| sync | Sync | svc:sync |
| tables | Tables | svc:tables |
| topology | Topology | svc:topology |
| uspto-proxy | Uspto Proxy | svc:uspto-proxy |
| views | Views | svc:views |
| webhook | Webhook | svc:webhook |
| xlsx | Xlsx | svc:xlsx |
Available Externals (auto-generated)
| id | label |
|---|---|
| ext:ai-agent-service | AI 代理服務 (admin-managed) |
| ext:gemini-direct | Google Gemini API (direct, no admin proxy) |
| ext:anthropic-direct | Anthropic Claude API (direct, no admin proxy) |
| ext:openai-direct | OpenAI API (direct, no admin proxy) |
| ext:gcp-documentai | Google Cloud Document AI |
| ext:uspto | USPTO 美國專利局 |
| ext:mistral | Mistral AI (direct, no admin proxy) |
| ext:google-patents | Google Patents |
| ext:aws-s3 | AWS S3 |
| ext:stripe | Stripe |