ROADMAP

What's in progress, what's coming in v0.3, decisions discarded with rationale.

13 / 14·planning·v0.2.81

Open work for alpi. Shipped work lives in CHANGELOG.md — this file never repeats it. For technical reference of what currently ships, see ARCHITECTURE.md.

Audience: the creator (@soyjavi) and any future contributor reading the repo cold.

Legend: 🔵 backlog · 🟡 next up · ⏸ blocked.


Open items

IDItemTargetStatus
ARv0.3 production release — website + content rewritev0.3🟡 v0.3 gate — blocks the cut
ATAudit system prompt + tool descriptions vs hermesv0.3🔵 research first
AIMemory v2 — better generation + TUI panelv0.3🔵 research first
AJBrowser realism — session persistence + login state + deeper antibotv0.3🔵
AODefault skills bundlev0.3🔵 intentionally empty — bundled skills emerge from concrete recurring patterns, not catalog imitation
AQVoice mode polish — STT + TTS quality + continuous modev0.3🔵
BBTUI: shared link renderer (bold + underline; hover = accent bg + black)v0.3🔵
BCExternal security audit before v0.3 public releasev0.3🔴 gate on AR
BDModel-aware tool-use-enforcement guidance (Claude/MiMo brevity, GPT/Codex/Gemini full block)v0.3🔵 needs A/B on agent.log first
ALP.2Alpi Link Protocol — inter-machine Noise-protocol transport + budget / rate-limit enforcementv0.4🔵 — depends on ALP.1 (shipped)
ALP.3Alpi Link Protocol — shared rooms (group chat, humans optional)v0.4🔵 — depends on ALP.1 (shipped)
HHome Assistant integrationlong-term⏸ blocked on user confirmation
NImage generationlong-term🔵 no concrete use case yet
USignal gateway (signal-cli)long-term🔵 requires dedicated phone number
Σ.1Mixture-of-agents tool (ensemble inference)stretch🔵 not planned, tracked for later
Σ.2RL training / fine-tuning hooksstretch🔵 not planned, tracked for later

v0.3.0 ships when AR lands — every other v0.3 row is "nice to have before the cut, not required". ALP.2 + ALP.3 are firmly v0.4.


Principles

alpi respects the ToS of every provider it integrates with. When an LLM vendor (OpenAI, Anthropic, …) offers a paid subscription for a first-party client (ChatGPT Plus/Pro, Claude Pro/Max, Claude Code), that subscription is for THAT client. Reverse-engineering the private OAuth flow of the official CLI to route a third-party agent against the same quota is:

The competitor landscape (hermes, similar third-party agents) routinely ships "Codex OAuth" / "Claude Code OAuth" features. alpi does not, and will not. If a vendor publishes an official OAuth-for-third-parties flow in the future (documented, stable, bindable), we adopt it then.

Practical consequence: users pay per-token API access through their own keys. That cost is honest and visible. Subscription routing is not on the roadmap.

See the Why alpi is built like this section in README.md for how the six Satoshi Ltd. principles (Privacy by Design, User Sovereignty, Security First, Open Source, Zero Knowledge, Digital Sovereignty) map to concrete choices in this repo.


v0.3 cycle

AR. v0.3 production release — website + content rewrite (v0.3 gate)

v0.3 is the first release intended for public consumption. That implies a presence (static site) and a content pass across README.md, docs/*, and the future landing page aligned with satoshi-ltd.com:

Deliverables before cutting v0.3.0:

  1. Static site — single-page, minimal, matching satoshi-ltd.com visual language. Lives in site/ at the repo root; deploys from there.
  2. README rewrite with the new positioning. Today's README is install-first; the new one leads with why alpi, install is a section.
  3. docs/ARCHITECTURE.md + docs/SECURITY.md audited for old framing ("experimental", "personal-use", "stretch goal") that no longer fits a production release.
  4. A short launch post for the personal blog / X account — optional, but the effort pays off once.

v0.3.0 doesn't ship until AR lands. The code is already v0.3-shaped (CLI shrunk, observability in, doctor live, centralised logs, ALP.1 shipped); what's missing is the narrative to back it.

AT. Audit system prompt + tool descriptions vs hermes

Research-first. Today alpi/prompts/system_prompt.md + each tool's description field are our main levers for how the LLM uses alpi. They've been tweaked reactively (add a line when a model misbehaves, compress when the prompt bloats) but never audited as a whole.

What to compare. Hermes is the closest reference codebase (see the memory entry for its path). For each alpi tool, read the hermes equivalent side by side and note:

System prompt. Same exercise for system_prompt.md: read our current version against hermes' system prompt, look for load-bearing guidance we're missing or redundant text we can drop. Bias toward shorter — every token in the system prompt is paid on every turn.

Done criterion. A short report listing the 3–5 concrete edits worth making, each with before / after + a rationale tied to observed behaviour in agent.log or sessions. Apply the edits that clear the bar; leave the rest.

Why research-first. "Rewrite all tool descriptions" is the easy way to waste a week. Measure first, edit surgically.

AI. Memory v2 — generation + TUI panel

Two sub-tasks, research-first:

  1. Generation quality. Revisit the memory tool description and body. Open questions: are we writing the right type per signal? Is the 70% Jaccard dedup too loose / too tight? Should the tool take a "confidence" field so low-conf writes auto-expire? Compare against Hermes + the latest public memory patterns (Mem0, Letta) and pick what fits our scope.
  2. TUI panel. /memory today shows the three files verbatim. Options: section-collapsible view, edit-in-place, "forget this" quick action, filter by type.

Ship 1 first (server-side quality) then 2 (surface improvements).

AJ. Browser realism — Cloudflare + captcha survival

Research-first. What exists: Playwright with playwright-stealth, humanised typing, per-profile browser/state.json. The open question is whether the current posture clears common anti-bot checkpoints — Cloudflare's "verify you are human" interstitial, Turnstile, hCaptcha challenges when they fire on the agent's traffic.

Step 1 — measurement. Build a scorecard script that runs the browser tool against the standard detection sites (bot.sannysoft.com, abrahamjuliot.github.io/creepjs, the Cloudflare "Are you under attack" demo) and captures what each detector reports. That grounds the gap analysis.

Step 2 — analysis. With the scorecard in hand, identify the top 3 signals we fail (webdriver flag, audio context fingerprint, canvas, WebGL, timing patterns, …) and decide which are worth closing. Not everything is worth chasing: a perfect stealth score is a moving target and extreme measures (full fingerprint rotation, residential proxy) carry their own risk.

Step 3 — implementation. Land the improvements behind the existing browser tool surface — no new config knobs unless strictly needed. Session persistence and login-state detection are adjacent concerns that naturally fall out of this work (a cookie-expired page looks different from a logged-in one); fold them in when the detection scaffold makes it cheap.

AO. Default skills bundle

BE ships the infrastructure to bundle skills under @alpi/*. This item is the curation side — what, if anything, to include.

Current position: no bundled skills. We deliberately resisted shipping a catalog of methodology skills imported from other agents (hermes has 59; most are off-scope for alpi). The ethos is "ship what you use" — bundle only skills that encode recurring patterns we actually observe in real usage, not generic write/code/web guides.

Candidates evaluated, deferred:

Trigger for shipping a bundled skill: noticing the same workflow scaffolding being re-asked 3+ times in real sessions across profiles. When that happens, one targeted SKILL.md plus an e2e test lands on its own, not as part of a bundle.

AQ. Voice mode polish — STT + TTS + continuous mode

The voice primitives shipped (tts, stt tools, Telegram voice inbound/outbound) but the surface still feels like two utility tools, not a first-class mode.

Open areas to evaluate before committing scope:

Start with a measurement pass (record a few real prompts, check STT accuracy + TTS latency end-to-end), then pick the two or three biggest wins.

Markdown links ([text](url)) today render as Rich's default — underlined text in the base foreground colour. Works, but blends with body text when the terminal's theme is low-contrast.

Proposed look.

Scope.

Done criterion. Walking through /memory, /help, a chat reply containing links, and an error message with a link renders them all with the same two-state visual. No widget has its own link style.

BC. External security audit before v0.3 public release

Gate on AR. v0.3 is the first release intended for public consumption (docs/ROADMAP.md → AR). Before we cut it, contract an external firm for a formal audit.

Scope of the engagement.

Output. A public report lives at docs/audits/v0.3-<vendor>.md (or linked from there). Issues found are either fixed before the release or documented in the report with a timeline. The report being published is part of the trust story — sitting on findings isn't.

Why external, not internal. Satoshi Ltd. builds the tool; an independent security firm reads it. The Satoshi principle "Open Source — Auditable code. Reproducible builds. Trust, but verify" applies to the organisation too.

BD. Model-aware tool-use-enforcement guidance

Gate the "Actually CALL the tool…" paragraph in alpi/prompts/system_prompt.md on model family. Claude / MiMo / Qwen / Sonnet / Opus follow tool instructions well without the long enforcement block; GPT / Codex / Gemini / Gemma / Grok need it. Hermes gates this via a model-substring list; measure on agent.log before committing.

Output: short report showing tool-call rate on a Claude session with vs without the block (same prompts). Apply the split only if no regression on the shorter variant.


v0.4 cycle

alpi agents couldn't talk to each other. ALP is alpi's own closed protocol for agent↔agent: intra-profile on the same machine, inter-machine over the public internet, shared rooms for N-agent workspaces. Security + privacy are hard requirements — every message is signed + encrypted, every peer is explicitly pinned (no discovery, no TOFU), every capability is fail-closed. Spec at docs/ALP.md. Three phases; ALP.1 (intra-profile) shipped in v0.2.68.

ALP.2 — Inter-machine Noise-protocol transport (v0.4)

Depends on ALP.1. New transport alpi/alp/noise.py + gateway listener — TCP listener with Noise_XK handshake producing forward-secret session keys, per-peer AEAD encryption on top. Explicitly NOT HTTPS: we use Noise (same framework as WireGuard) so we don't drag TLS's 30-year legacy of downgrade attacks and cert-management headaches into a peer-to-peer tool. Peer entry gains address: host:port. Same verbs as ALP.1, different transport. Tailscale / WireGuard as a network-layer front-end is the blessed deployment; direct public-internet exposure is supported but discouraged.

Also in ALP.2: budget + rate-limit enforcement. The peers.yaml budget.tokens_per_day, budget.usd_per_day, and rate_limit.requests_per_minute fields already parse in ALP.1 but don't enforce. ALP.2 ships the ledger + UTC-midnight reset + the -32005 budget-exceeded response path.

ALP.3 — Shared rooms (v0.4)

Depends on ALP.1. First-class group-chat workspaces — N alpis (different profiles, different machines) post into a shared transcript; a human can join via TUI /room or stay out entirely. Hub model (the room creator holds transcript + group key), not gossip. Per-room agent budget and kill switch as safety levers. New verbs (room.create, room.join, room.post, room.pull, room.leave, room.pause), rekey on member leave.


Long-term / stretch

H. Home Assistant integration

Only if @soyjavi runs Home Assistant. Hermes has homeassistant_tool as a reference. Requires HA_URL + a long-lived token in .env. Typical uses: read sensors, toggle lights/scenes, query occupancy. Blocked on confirmation that HA is part of the setup.

N. Image generation

generate_image(prompt, style) using the active vision model or a dedicated endpoint (DALL-E, SD). Useful for "hazme un logo rápido". Low priority unless a concrete use case appears.

U. Signal gateway (signal-cli)

Signal has the best security posture of any consumer messenger, but integration requires a dedicated phone number for the bot (you can't bot your own number — Signal won't allow two sessions simultaneously in a useful way). signal-cli runs as a local daemon exposing an HTTP/JSON-RPC endpoint; we just POST/GET messages.

Scope. alpi/gateway/platforms/signal.py talking to a locally-running signal-cli daemon --http 127.0.0.1:…. First-run: user registers a bot number, follows signal-cli's captcha + SMS verify flow once (signal-cli -u <num> register), then alpi setup → Gateways → Signal stores the daemon URL + allowlist of sender numbers.

Estimated LOC: ~200 (HTTP client + polling loop + send).

Blocker: requires extra SIM / VoIP number. Real cost: ~$5/mo (Twilio / JustCall). Niche unless you want E2EE + self-hosted.

Σ.1. Mixture-of-agents (stretch goal)

Spawn multiple LLMs on the same prompt, aggregate answers with a final synthesizer. Hermes has this as mixture_of_agents_tool.py. Use case: hard decisions where one model is weak and you want "wisdom of crowds" at 3× cost.

Not planned — tracked here because it's a known technique and might become useful if we hit a ceiling on single-model research quality.

Σ.2. RL training / fine-tuning hooks (stretch goal)

Hermes has rl_training_tool.py for recording agent runs and building training datasets. If we ever want to fine-tune a smaller local model on your actual conversation patterns, the dataset-collection scaffold would live here.

Not planned. Research-grade, irrelevant for everyday personal use.


Decisions discarded — don't relitigate

Rejected integrations / providers:

Rejected architecture attempts:

Rejected behaviours:

Rejected dependencies: