Extracts per-message content into a typed `blocks` list (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown) and renders them at exporter write time. Voice transcripts, Custom Instructions, and image references now appear in exports instead of being silently dropped. Foundation: - src/blocks.py: pure block constructors, _safe_fence (fence-corruption defense, verified live in Joplin), _blockquote_prefix, render - src/loss_report.py: per-run tally surfaced as INFO summary at end of export so silently-dropped data becomes visible Providers: - ChatGPT: dispatch on content_type produces typed blocks; voice shapes (audio_transcription, audio_asset_pointer, real_time_user_audio_video_ asset_pointer) locked from live DevTools capture; Custom Instructions bug fix (parts-vs-direct-fields); role filter lifted; hidden-context marker driven by is_visually_hidden_from_conversation flag - Claude: defensive dispatch for text/thinking/tool_use/tool_result/image with recursive nested-block flattening; untested against real rich- content data — fix-forward in v0.4.1 Exporter: - Markdown renders from blocks at write time via render_blocks_to_markdown; backward-compat fallback to content for any pre-v0.4.0 cached data Tests: - 27 new tests across providers, exporters, CLI; fixtures rebuilt with real-shape ChatGPT voice + Custom Instructions cases - 181/181 pass Behavior changes (intentional): - JSON output omits content; consumers should read blocks - Per-conversation message counts increase (Custom Instructions, image- only, tool-only messages now appear) - Existing exports not auto-re-rendered; users wanting fresh output run cache --clear then export Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.0 KiB
Planned Future Work
Items completed in each release are moved to the changelog. Items here are designed for but not yet implemented. The codebase is structured to make each of these additions straightforward.
Completed:
- v0.1.0 — Core export: ChatGPT + Claude, incremental sync, Markdown + JSON output
- v0.2.0 — Joplin import automation (
joplincommand, create/update notes, notebook auto-creation) - v0.4.0 — Rich content support: typed message blocks (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown); ChatGPT voice transcripts as text + audio placeholders; Custom Instructions extraction; data-loss visibility via
LossReportsummary and visibleunknownblocks
Export --force Flag (v0.2.x)
Add --force to the export command to re-export already-cached conversations
without permanently clearing the entire manifest. Useful for re-generating files
after changing the Markdown template or output structure.
Implementation: pass a force=True flag to cache.get_new_or_updated(), which
returns all conversations regardless of cache state when force is True.
Current workaround: python -m src.main cache --clear then re-run export.
Joplin --force Flag (v0.2.x)
Similarly, add --force to the joplin command to re-sync all cached
conversations to Joplin regardless of whether they've been synced before.
Useful after making formatting changes to the Markdown exporter.
Implementation: in get_joplin_pending(), return all entries that have a
file_path when force=True, ignoring joplin_synced_at.
Per-Conversation Cache Reset (v0.2.x)
Add cache --reset --conversation <id> to force re-export or re-sync of a
single conversation without clearing the entire provider cache.
Current workaround: manually edit ~/.ai-chat-exporter/manifest.json and
delete the entry, then re-run export.
Official API Fallback (v0.3.0)
If the unofficial internal web API approach breaks, migrate to official export file parsing as a fallback:
- ChatGPT: parse
conversations.jsonfrom Settings → Export Data - Claude: parse
conversations.jsonfrom Settings → Privacy → Export Data
The BaseProvider abstract class is intentionally designed so that a
FileProvider subclass can implement the same interface
(list_conversations, get_conversation, normalize_conversation)
without any changes to cache, exporters, or CLI code.
To add this: implement src/providers/file_chatgpt.py and
src/providers/file_claude.py, then add --input-file flag to the
export command to accept a pre-downloaded export ZIP or JSON.
Binary Content Downloads (v0.5.0)
v0.4.0 ships placeholders for images and audio assets but does not download
the binary content. The _safe_fence-wrapped placeholders include the asset
reference (sediment://... or file-service://...), MIME type, size, and
duration where available; the actual bytes are not preserved.
Next steps:
- Download attached images alongside the Markdown export, save under a
media/sibling directory with a stable filename derived from the asset reference. - Replace
image_placeholderrendering with an inlinereference once the file is on disk. - Joplin integration: upload binaries as Joplin resources via
POST /resources, rewrite the rendered Markdown to use:/resourceIdreferences, and track the resource ID in the cache manifest so re-syncs stay idempotent. - DALL-E images on the assistant side: not observed in this user's data; the
code path exists (
source = "model_generated") but is untested.
The block-level schema is already in place — only the file-fetch + rewrite
layer needs to be added. See the image_placeholder and file_placeholder
block definitions in src/blocks.py.
Reclassify o1/o3 Reasoning Subparts (v0.4.1)
v0.4.0 leaves dict parts inside text content_type messages with shape
{"summary": ..., "content": ...} rendered as plain text (defensive — the
shape was inferred from a code comment, not captured live). Once a real
reasoning conversation is captured, reclassify these as thinking blocks.
Suppress Hidden Context (v0.4.x)
If Custom Instructions duplication across conversations becomes a storage
problem, add EXPORTER_INCLUDE_HIDDEN_CONTEXT=false env var. The toggle is
a single os.getenv() check at the start of
_extract_editable_context_blocks in src/providers/chatgpt.py — return
empty list if disabled.
Scheduled / Watch Mode (v0.5.0)
Add a watch command (or cron integration helper) to run exports automatically
on a schedule:
python -m src.main watch --interval 6h # poll every 6 hours
This would run export + joplin in sequence, then sleep. Alternatively,
provide a cron command that prints the correct crontab line for the user's
setup.
Implementation: simple loop with time.sleep(), or emit a crontab entry
string that calls the export and joplin commands in sequence. A --once
flag would do a single run then exit (useful for cron itself).
Obsidian Vault Output (v0.5.0)
Add an obsidian command (or --target obsidian flag) to sync exported
conversations into an Obsidian vault directory. The current Markdown format
is already largely compatible; the main differences are:
- Obsidian uses YAML frontmatter
properties(same format, already supported) - Tags should use
#taginline ortags:list in frontmatter (already done) - Wikilinks (
[[Title]]) instead of Markdown links — optional, Obsidian supports both
Implementation: the existing MarkdownExporter output is already valid in
Obsidian. An ObsidianSyncer class (mirroring JoplinClient) would simply
copy files to the vault directory and maintain a flat or nested folder
structure matching the user's Obsidian setup. No API needed — just file I/O.
Joplin Nested Notebooks (future)
Currently notebooks are flat: ChatGPT - My Project. Joplin supports nested
notebooks via parent_id. A future option (JOPLIN_NESTED_NOTEBOOKS=true)
could create a two-level hierarchy:
ChatGPT/
My Project/
No Project/
Claude/
Budget Tracker/
Implementation: get_or_create_notebook would first find/create the provider
notebook, then find/create the project notebook as a child.
Token Expiry Notifications (future)
Proactively warn when a token is close to expiry (within 48h for ChatGPT), rather than only surfacing the warning at startup. Options:
- Add an
expirysubcommand that prints token status and exits non-zero if any token is expired or expiring soon (useful in scripts/cron) - Send a desktop notification via
notify-send(Linux) orosascript(macOS) when a token is within 24h of expiry
Search Command (future)
Add a search command to full-text search across all exported Markdown files:
python -m src.main search "kubernetes ingress"
python -m src.main search "kubernetes ingress" --provider claude --project devops
Implementation: grep/ripgrep over EXPORT_DIR, display results with
conversation title, date, and a snippet. No index needed — Markdown files are
small enough to grep directly.