Extracts per-message content into a typed `blocks` list (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown) and renders them at exporter write time. Voice transcripts, Custom Instructions, and image references now appear in exports instead of being silently dropped. Foundation: - src/blocks.py: pure block constructors, _safe_fence (fence-corruption defense, verified live in Joplin), _blockquote_prefix, render - src/loss_report.py: per-run tally surfaced as INFO summary at end of export so silently-dropped data becomes visible Providers: - ChatGPT: dispatch on content_type produces typed blocks; voice shapes (audio_transcription, audio_asset_pointer, real_time_user_audio_video_ asset_pointer) locked from live DevTools capture; Custom Instructions bug fix (parts-vs-direct-fields); role filter lifted; hidden-context marker driven by is_visually_hidden_from_conversation flag - Claude: defensive dispatch for text/thinking/tool_use/tool_result/image with recursive nested-block flattening; untested against real rich- content data — fix-forward in v0.4.1 Exporter: - Markdown renders from blocks at write time via render_blocks_to_markdown; backward-compat fallback to content for any pre-v0.4.0 cached data Tests: - 27 new tests across providers, exporters, CLI; fixtures rebuilt with real-shape ChatGPT voice + Custom Instructions cases - 181/181 pass Behavior changes (intentional): - JSON output omits content; consumers should read blocks - Per-conversation message counts increase (Custom Instructions, image- only, tool-only messages now appear) - Existing exports not auto-re-rendered; users wanting fresh output run cache --clear then export Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
182 lines
7.0 KiB
Markdown
182 lines
7.0 KiB
Markdown
# Planned Future Work
|
|
|
|
Items completed in each release are moved to the changelog. Items here are
|
|
designed for but not yet implemented. The codebase is structured to make each
|
|
of these additions straightforward.
|
|
|
|
**Completed:**
|
|
- v0.1.0 — Core export: ChatGPT + Claude, incremental sync, Markdown + JSON output
|
|
- v0.2.0 — Joplin import automation (`joplin` command, create/update notes, notebook auto-creation)
|
|
- v0.4.0 — Rich content support: typed message blocks (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown); ChatGPT voice transcripts as text + audio placeholders; Custom Instructions extraction; data-loss visibility via `LossReport` summary and visible `unknown` blocks
|
|
|
|
---
|
|
|
|
## Export `--force` Flag (v0.2.x)
|
|
|
|
Add `--force` to the `export` command to re-export already-cached conversations
|
|
without permanently clearing the entire manifest. Useful for re-generating files
|
|
after changing the Markdown template or output structure.
|
|
|
|
Implementation: pass a `force=True` flag to `cache.get_new_or_updated()`, which
|
|
returns all conversations regardless of cache state when force is True.
|
|
|
|
Current workaround: `python -m src.main cache --clear` then re-run export.
|
|
|
|
## Joplin `--force` Flag (v0.2.x)
|
|
|
|
Similarly, add `--force` to the `joplin` command to re-sync all cached
|
|
conversations to Joplin regardless of whether they've been synced before.
|
|
Useful after making formatting changes to the Markdown exporter.
|
|
|
|
Implementation: in `get_joplin_pending()`, return all entries that have a
|
|
`file_path` when `force=True`, ignoring `joplin_synced_at`.
|
|
|
|
## Per-Conversation Cache Reset (v0.2.x)
|
|
|
|
Add `cache --reset --conversation <id>` to force re-export or re-sync of a
|
|
single conversation without clearing the entire provider cache.
|
|
|
|
Current workaround: manually edit `~/.ai-chat-exporter/manifest.json` and
|
|
delete the entry, then re-run export.
|
|
|
|
---
|
|
|
|
## Official API Fallback (v0.3.0)
|
|
|
|
If the unofficial internal web API approach breaks, migrate to official export
|
|
file parsing as a fallback:
|
|
- ChatGPT: parse `conversations.json` from Settings → Export Data
|
|
- Claude: parse `conversations.json` from Settings → Privacy → Export Data
|
|
|
|
The `BaseProvider` abstract class is intentionally designed so that a
|
|
`FileProvider` subclass can implement the same interface
|
|
(`list_conversations`, `get_conversation`, `normalize_conversation`)
|
|
without any changes to cache, exporters, or CLI code.
|
|
|
|
To add this: implement `src/providers/file_chatgpt.py` and
|
|
`src/providers/file_claude.py`, then add `--input-file` flag to the
|
|
export command to accept a pre-downloaded export ZIP or JSON.
|
|
|
|
---
|
|
|
|
## Binary Content Downloads (v0.5.0)
|
|
|
|
v0.4.0 ships placeholders for images and audio assets but does not download
|
|
the binary content. The `_safe_fence`-wrapped placeholders include the asset
|
|
reference (`sediment://...` or `file-service://...`), MIME type, size, and
|
|
duration where available; the actual bytes are not preserved.
|
|
|
|
Next steps:
|
|
- Download attached images alongside the Markdown export, save under a
|
|
`media/` sibling directory with a stable filename derived from the asset
|
|
reference.
|
|
- Replace `image_placeholder` rendering with an inline ``
|
|
reference once the file is on disk.
|
|
- Joplin integration: upload binaries as Joplin resources via `POST /resources`,
|
|
rewrite the rendered Markdown to use `:/resourceId` references, and track
|
|
the resource ID in the cache manifest so re-syncs stay idempotent.
|
|
- DALL-E images on the assistant side: not observed in this user's data; the
|
|
code path exists (`source = "model_generated"`) but is untested.
|
|
|
|
The block-level schema is already in place — only the file-fetch + rewrite
|
|
layer needs to be added. See the `image_placeholder` and `file_placeholder`
|
|
block definitions in `src/blocks.py`.
|
|
|
|
## Reclassify o1/o3 Reasoning Subparts (v0.4.1)
|
|
|
|
v0.4.0 leaves dict parts inside `text` content_type messages with shape
|
|
`{"summary": ..., "content": ...}` rendered as plain text (defensive — the
|
|
shape was inferred from a code comment, not captured live). Once a real
|
|
reasoning conversation is captured, reclassify these as `thinking` blocks.
|
|
|
|
## Suppress Hidden Context (v0.4.x)
|
|
|
|
If Custom Instructions duplication across conversations becomes a storage
|
|
problem, add `EXPORTER_INCLUDE_HIDDEN_CONTEXT=false` env var. The toggle is
|
|
a single `os.getenv()` check at the start of
|
|
`_extract_editable_context_blocks` in `src/providers/chatgpt.py` — return
|
|
empty list if disabled.
|
|
|
|
---
|
|
|
|
## Scheduled / Watch Mode (v0.5.0)
|
|
|
|
Add a `watch` command (or cron integration helper) to run exports automatically
|
|
on a schedule:
|
|
|
|
```bash
|
|
python -m src.main watch --interval 6h # poll every 6 hours
|
|
```
|
|
|
|
This would run `export` + `joplin` in sequence, then sleep. Alternatively,
|
|
provide a `cron` command that prints the correct crontab line for the user's
|
|
setup.
|
|
|
|
Implementation: simple loop with `time.sleep()`, or emit a crontab entry
|
|
string that calls the export and joplin commands in sequence. A `--once`
|
|
flag would do a single run then exit (useful for cron itself).
|
|
|
|
---
|
|
|
|
## Obsidian Vault Output (v0.5.0)
|
|
|
|
Add an `obsidian` command (or `--target obsidian` flag) to sync exported
|
|
conversations into an Obsidian vault directory. The current Markdown format
|
|
is already largely compatible; the main differences are:
|
|
|
|
- Obsidian uses YAML frontmatter `properties` (same format, already supported)
|
|
- Tags should use `#tag` inline or `tags:` list in frontmatter (already done)
|
|
- Wikilinks (`[[Title]]`) instead of Markdown links — optional, Obsidian
|
|
supports both
|
|
|
|
Implementation: the existing `MarkdownExporter` output is already valid in
|
|
Obsidian. An `ObsidianSyncer` class (mirroring `JoplinClient`) would simply
|
|
copy files to the vault directory and maintain a flat or nested folder
|
|
structure matching the user's Obsidian setup. No API needed — just file I/O.
|
|
|
|
---
|
|
|
|
## Joplin Nested Notebooks (future)
|
|
|
|
Currently notebooks are flat: `ChatGPT - My Project`. Joplin supports nested
|
|
notebooks via `parent_id`. A future option (`JOPLIN_NESTED_NOTEBOOKS=true`)
|
|
could create a two-level hierarchy:
|
|
|
|
```
|
|
ChatGPT/
|
|
My Project/
|
|
No Project/
|
|
Claude/
|
|
Budget Tracker/
|
|
```
|
|
|
|
Implementation: `get_or_create_notebook` would first find/create the provider
|
|
notebook, then find/create the project notebook as a child.
|
|
|
|
---
|
|
|
|
## Token Expiry Notifications (future)
|
|
|
|
Proactively warn when a token is close to expiry (within 48h for ChatGPT),
|
|
rather than only surfacing the warning at startup. Options:
|
|
|
|
- Add an `expiry` subcommand that prints token status and exits non-zero if
|
|
any token is expired or expiring soon (useful in scripts/cron)
|
|
- Send a desktop notification via `notify-send` (Linux) or `osascript` (macOS)
|
|
when a token is within 24h of expiry
|
|
|
|
---
|
|
|
|
## Search Command (future)
|
|
|
|
Add a `search` command to full-text search across all exported Markdown files:
|
|
|
|
```bash
|
|
python -m src.main search "kubernetes ingress"
|
|
python -m src.main search "kubernetes ingress" --provider claude --project devops
|
|
```
|
|
|
|
Implementation: `grep`/`ripgrep` over `EXPORT_DIR`, display results with
|
|
conversation title, date, and a snippet. No index needed — Markdown files are
|
|
small enough to grep directly.
|