Files
ai-chatexport/FUTURE.md
JesseMarkowitz 473d02f71a feat: v0.4.0 — rich content support with typed blocks and loss visibility
Extracts per-message content into a typed `blocks` list (text, code,
thinking, tool_use, tool_result, image_placeholder, file_placeholder,
unknown) and renders them at exporter write time. Voice transcripts,
Custom Instructions, and image references now appear in exports
instead of being silently dropped.

Foundation:
- src/blocks.py: pure block constructors, _safe_fence (fence-corruption
  defense, verified live in Joplin), _blockquote_prefix, render
- src/loss_report.py: per-run tally surfaced as INFO summary at end of
  export so silently-dropped data becomes visible

Providers:
- ChatGPT: dispatch on content_type produces typed blocks; voice shapes
  (audio_transcription, audio_asset_pointer, real_time_user_audio_video_
  asset_pointer) locked from live DevTools capture; Custom Instructions
  bug fix (parts-vs-direct-fields); role filter lifted; hidden-context
  marker driven by is_visually_hidden_from_conversation flag
- Claude: defensive dispatch for text/thinking/tool_use/tool_result/image
  with recursive nested-block flattening; untested against real rich-
  content data — fix-forward in v0.4.1

Exporter:
- Markdown renders from blocks at write time via render_blocks_to_markdown;
  backward-compat fallback to content for any pre-v0.4.0 cached data

Tests:
- 27 new tests across providers, exporters, CLI; fixtures rebuilt with
  real-shape ChatGPT voice + Custom Instructions cases
- 181/181 pass

Behavior changes (intentional):
- JSON output omits content; consumers should read blocks
- Per-conversation message counts increase (Custom Instructions, image-
  only, tool-only messages now appear)
- Existing exports not auto-re-rendered; users wanting fresh output run
  cache --clear then export

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 23:17:18 -04:00

182 lines
7.0 KiB
Markdown

# Planned Future Work
Items completed in each release are moved to the changelog. Items here are
designed for but not yet implemented. The codebase is structured to make each
of these additions straightforward.
**Completed:**
- v0.1.0 — Core export: ChatGPT + Claude, incremental sync, Markdown + JSON output
- v0.2.0 — Joplin import automation (`joplin` command, create/update notes, notebook auto-creation)
- v0.4.0 — Rich content support: typed message blocks (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown); ChatGPT voice transcripts as text + audio placeholders; Custom Instructions extraction; data-loss visibility via `LossReport` summary and visible `unknown` blocks
---
## Export `--force` Flag (v0.2.x)
Add `--force` to the `export` command to re-export already-cached conversations
without permanently clearing the entire manifest. Useful for re-generating files
after changing the Markdown template or output structure.
Implementation: pass a `force=True` flag to `cache.get_new_or_updated()`, which
returns all conversations regardless of cache state when force is True.
Current workaround: `python -m src.main cache --clear` then re-run export.
## Joplin `--force` Flag (v0.2.x)
Similarly, add `--force` to the `joplin` command to re-sync all cached
conversations to Joplin regardless of whether they've been synced before.
Useful after making formatting changes to the Markdown exporter.
Implementation: in `get_joplin_pending()`, return all entries that have a
`file_path` when `force=True`, ignoring `joplin_synced_at`.
## Per-Conversation Cache Reset (v0.2.x)
Add `cache --reset --conversation <id>` to force re-export or re-sync of a
single conversation without clearing the entire provider cache.
Current workaround: manually edit `~/.ai-chat-exporter/manifest.json` and
delete the entry, then re-run export.
---
## Official API Fallback (v0.3.0)
If the unofficial internal web API approach breaks, migrate to official export
file parsing as a fallback:
- ChatGPT: parse `conversations.json` from Settings → Export Data
- Claude: parse `conversations.json` from Settings → Privacy → Export Data
The `BaseProvider` abstract class is intentionally designed so that a
`FileProvider` subclass can implement the same interface
(`list_conversations`, `get_conversation`, `normalize_conversation`)
without any changes to cache, exporters, or CLI code.
To add this: implement `src/providers/file_chatgpt.py` and
`src/providers/file_claude.py`, then add `--input-file` flag to the
export command to accept a pre-downloaded export ZIP or JSON.
---
## Binary Content Downloads (v0.5.0)
v0.4.0 ships placeholders for images and audio assets but does not download
the binary content. The `_safe_fence`-wrapped placeholders include the asset
reference (`sediment://...` or `file-service://...`), MIME type, size, and
duration where available; the actual bytes are not preserved.
Next steps:
- Download attached images alongside the Markdown export, save under a
`media/` sibling directory with a stable filename derived from the asset
reference.
- Replace `image_placeholder` rendering with an inline `![](relative/path)`
reference once the file is on disk.
- Joplin integration: upload binaries as Joplin resources via `POST /resources`,
rewrite the rendered Markdown to use `:/resourceId` references, and track
the resource ID in the cache manifest so re-syncs stay idempotent.
- DALL-E images on the assistant side: not observed in this user's data; the
code path exists (`source = "model_generated"`) but is untested.
The block-level schema is already in place — only the file-fetch + rewrite
layer needs to be added. See the `image_placeholder` and `file_placeholder`
block definitions in `src/blocks.py`.
## Reclassify o1/o3 Reasoning Subparts (v0.4.1)
v0.4.0 leaves dict parts inside `text` content_type messages with shape
`{"summary": ..., "content": ...}` rendered as plain text (defensive — the
shape was inferred from a code comment, not captured live). Once a real
reasoning conversation is captured, reclassify these as `thinking` blocks.
## Suppress Hidden Context (v0.4.x)
If Custom Instructions duplication across conversations becomes a storage
problem, add `EXPORTER_INCLUDE_HIDDEN_CONTEXT=false` env var. The toggle is
a single `os.getenv()` check at the start of
`_extract_editable_context_blocks` in `src/providers/chatgpt.py` — return
empty list if disabled.
---
## Scheduled / Watch Mode (v0.5.0)
Add a `watch` command (or cron integration helper) to run exports automatically
on a schedule:
```bash
python -m src.main watch --interval 6h # poll every 6 hours
```
This would run `export` + `joplin` in sequence, then sleep. Alternatively,
provide a `cron` command that prints the correct crontab line for the user's
setup.
Implementation: simple loop with `time.sleep()`, or emit a crontab entry
string that calls the export and joplin commands in sequence. A `--once`
flag would do a single run then exit (useful for cron itself).
---
## Obsidian Vault Output (v0.5.0)
Add an `obsidian` command (or `--target obsidian` flag) to sync exported
conversations into an Obsidian vault directory. The current Markdown format
is already largely compatible; the main differences are:
- Obsidian uses YAML frontmatter `properties` (same format, already supported)
- Tags should use `#tag` inline or `tags:` list in frontmatter (already done)
- Wikilinks (`[[Title]]`) instead of Markdown links — optional, Obsidian
supports both
Implementation: the existing `MarkdownExporter` output is already valid in
Obsidian. An `ObsidianSyncer` class (mirroring `JoplinClient`) would simply
copy files to the vault directory and maintain a flat or nested folder
structure matching the user's Obsidian setup. No API needed — just file I/O.
---
## Joplin Nested Notebooks (future)
Currently notebooks are flat: `ChatGPT - My Project`. Joplin supports nested
notebooks via `parent_id`. A future option (`JOPLIN_NESTED_NOTEBOOKS=true`)
could create a two-level hierarchy:
```
ChatGPT/
My Project/
No Project/
Claude/
Budget Tracker/
```
Implementation: `get_or_create_notebook` would first find/create the provider
notebook, then find/create the project notebook as a child.
---
## Token Expiry Notifications (future)
Proactively warn when a token is close to expiry (within 48h for ChatGPT),
rather than only surfacing the warning at startup. Options:
- Add an `expiry` subcommand that prints token status and exits non-zero if
any token is expired or expiring soon (useful in scripts/cron)
- Send a desktop notification via `notify-send` (Linux) or `osascript` (macOS)
when a token is within 24h of expiry
---
## Search Command (future)
Add a `search` command to full-text search across all exported Markdown files:
```bash
python -m src.main search "kubernetes ingress"
python -m src.main search "kubernetes ingress" --provider claude --project devops
```
Implementation: `grep`/`ripgrep` over `EXPORT_DIR`, display results with
conversation title, date, and a snippet. No index needed — Markdown files are
small enough to grep directly.