Extracts per-message content into a typed `blocks` list (text, code, thinking, tool_use, tool_result, image_placeholder, file_placeholder, unknown) and renders them at exporter write time. Voice transcripts, Custom Instructions, and image references now appear in exports instead of being silently dropped. Foundation: - src/blocks.py: pure block constructors, _safe_fence (fence-corruption defense, verified live in Joplin), _blockquote_prefix, render - src/loss_report.py: per-run tally surfaced as INFO summary at end of export so silently-dropped data becomes visible Providers: - ChatGPT: dispatch on content_type produces typed blocks; voice shapes (audio_transcription, audio_asset_pointer, real_time_user_audio_video_ asset_pointer) locked from live DevTools capture; Custom Instructions bug fix (parts-vs-direct-fields); role filter lifted; hidden-context marker driven by is_visually_hidden_from_conversation flag - Claude: defensive dispatch for text/thinking/tool_use/tool_result/image with recursive nested-block flattening; untested against real rich- content data — fix-forward in v0.4.1 Exporter: - Markdown renders from blocks at write time via render_blocks_to_markdown; backward-compat fallback to content for any pre-v0.4.0 cached data Tests: - 27 new tests across providers, exporters, CLI; fixtures rebuilt with real-shape ChatGPT voice + Custom Instructions cases - 181/181 pass Behavior changes (intentional): - JSON output omits content; consumers should read blocks - Per-conversation message counts increase (Custom Instructions, image- only, tool-only messages now appear) - Existing exports not auto-re-rendered; users wanting fresh output run cache --clear then export Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
460 lines
17 KiB
Markdown
460 lines
17 KiB
Markdown
# AI Chat Exporter
|
||
|
||
A personal backup tool for ChatGPT and Claude conversation history. Exports your chats to Markdown files and syncs them to [Joplin](https://joplinapp.org/) as notes. Each conversation becomes a single `.md` file with YAML frontmatter, organised into folders that map directly to Joplin notebooks.
|
||
|
||
Supports incremental sync — only new or updated conversations are exported on each run. Every run is resumable: if interrupted, re-running picks up exactly where it left off.
|
||
|
||
---
|
||
|
||
## ⚠️ Terms of Service Warning
|
||
|
||
**Read this before using this tool.**
|
||
|
||
This tool works by accessing **unofficial, undocumented internal web API endpoints** used by the ChatGPT and Claude web apps. These endpoints are not publicly supported by OpenAI or Anthropic and are subject to change or removal without notice.
|
||
|
||
**Use of this tool may conflict with their Terms of Service:**
|
||
- OpenAI: https://openai.com/policies/terms-of-use
|
||
- Anthropic: https://www.anthropic.com/legal/consumer-terms
|
||
|
||
**By using this tool, you accept that:**
|
||
- You are using it entirely at your own risk
|
||
- Your account could potentially be suspended for automated or scripted access
|
||
- The internal APIs this tool relies on may break at any time without notice
|
||
- This tool is for **personal archival use only** — not commercial use
|
||
|
||
This tool is designed for a single user backing up their own conversations. Do not use it to scrape data at scale or for any commercial purpose.
|
||
|
||
---
|
||
|
||
## Installation
|
||
|
||
### Linux / macOS
|
||
|
||
```bash
|
||
git clone <repo-url>
|
||
cd ai-chat-exporter
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
pip install -e ".[dev]"
|
||
```
|
||
|
||
### Windows
|
||
|
||
No admin access required. Run these in **Command Prompt** (`cmd.exe`) — it's the simplest option on Windows because it doesn't have PowerShell's script execution policy restrictions.
|
||
|
||
```bat
|
||
git clone <repo-url>
|
||
cd ai-chat-exporter
|
||
python -m venv .venv
|
||
.venv\Scripts\activate
|
||
pip install -e ".[dev]"
|
||
```
|
||
|
||
All `ai-chat-exporter` commands work identically in Command Prompt.
|
||
|
||
**Using PowerShell instead?** If you prefer PowerShell, you may need to allow script execution first (one-time, current user only):
|
||
|
||
```powershell
|
||
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||
```
|
||
|
||
Then activate the venv and run commands the same way.
|
||
|
||
**Prerequisites:**
|
||
- Python 3.11 or later — install from [python.org](https://www.python.org/downloads/windows/). During installation, tick **"Add Python to PATH"**.
|
||
- Git — install from [git-scm.com](https://git-scm.com/) if not already present.
|
||
|
||
**Notes:**
|
||
- The cache manifest and logs are stored in `cache\` inside the install directory — the same as on Linux.
|
||
- File permission hardening (`chmod 600`) is silently ignored on Windows — not a concern for single-user desktop use.
|
||
- Joplin Web Clipper runs on `localhost:41184` on all platforms; no configuration changes needed.
|
||
|
||
---
|
||
|
||
## First Run: Run Doctor
|
||
|
||
Before anything else, validate your setup:
|
||
|
||
```bash
|
||
ai-chat-exporter doctor
|
||
```
|
||
|
||
This checks token presence, format, expiry, directory permissions, disk space, and live API connectivity. Fix any failures before proceeding.
|
||
|
||
---
|
||
|
||
## Getting Your Session Tokens
|
||
|
||
Session tokens are how your browser stays logged in. This tool uses them to access your chat history on your behalf.
|
||
|
||
### Token Lifetimes
|
||
|
||
| Provider | Cookie Name | Lifetime | Expiry Detection |
|
||
|----------|-------------|----------|-----------------|
|
||
| ChatGPT | `__Secure-next-auth.session-token.0` + `.1` | ~7 days | JWT `exp` claim (decoded automatically) |
|
||
| Claude | `sessionKey` | ~30 days | Only detectable via 401 response |
|
||
|
||
### Finding Tokens in Chrome DevTools
|
||
|
||
1. Open the provider's website and make sure you're logged in
|
||
2. Press **F12** (Windows/Linux) or **Cmd+Option+I** (macOS) to open DevTools
|
||
3. Click the **Application** tab
|
||
4. In the left panel, expand **Cookies** and click the site URL
|
||
5. Find the cookie by name and copy its **Value**
|
||
|
||
**ChatGPT:** go to `https://chatgpt.com` → find **two** cookies:
|
||
- `__Secure-next-auth.session-token.0` — copy Value (starts with `eyJ`) → `CHATGPT_SESSION_TOKEN`
|
||
- `__Secure-next-auth.session-token.1` — copy Value → `CHATGPT_SESSION_TOKEN_1`
|
||
|
||
ChatGPT splits large session tokens across two cookies to stay under the browser's 4KB cookie limit. Both are required.
|
||
|
||
**Claude:** go to `https://claude.ai` → find `sessionKey` → copy Value
|
||
|
||
### When Tokens Expire
|
||
|
||
When a token expires you'll see a `401 Unauthorized` error. To refresh:
|
||
- Re-run the `auth` wizard: `ai-chat-exporter auth`
|
||
- Or manually update the value in your `.env` file
|
||
|
||
---
|
||
|
||
## The `auth` Command
|
||
|
||
The easiest way to configure tokens is the interactive wizard:
|
||
|
||
```bash
|
||
ai-chat-exporter auth
|
||
```
|
||
|
||
This walks you through finding your token, validates it, shows the expiry date (ChatGPT only), and offers to write it to your `.env` automatically. Tokens are never echoed to the terminal.
|
||
|
||
---
|
||
|
||
## `.env` Setup
|
||
|
||
Copy `.env.example` to `.env` and fill in your values:
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
```
|
||
|
||
### Provider tokens
|
||
|
||
| Variable | Description |
|
||
|----------|-------------|
|
||
| `CHATGPT_SESSION_TOKEN` | ChatGPT session token chunk `.0` (starts with `eyJ…`) |
|
||
| `CHATGPT_SESSION_TOKEN_1` | ChatGPT session token chunk `.1` (the remainder) |
|
||
| `CHATGPT_PROJECT_IDS` | Comma-separated ChatGPT project IDs (see below) |
|
||
| `CLAUDE_SESSION_KEY` | Your Claude session key |
|
||
|
||
### Output
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `EXPORT_DIR` | `./exports` | Where to write exported Markdown files |
|
||
| `OUTPUT_STRUCTURE` | `provider/project/year` | Folder structure (see below) |
|
||
|
||
### Joplin
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `JOPLIN_API_TOKEN` | — | Authorization token from Joplin Web Clipper settings |
|
||
| `JOPLIN_API_URL` | `http://localhost:41184` | Joplin API URL (change only if you've customised the port) |
|
||
| `JOPLIN_REQUEST_TIMEOUT` | `30` | Seconds before an API call times out. Increase for very large conversations. |
|
||
|
||
### Cache & logging
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `CACHE_DIR` | `./cache` | Where to store the sync manifest |
|
||
| `LOG_FILE` | `./cache/logs/exporter.log` | Log file path (`none` to disable) |
|
||
|
||
---
|
||
|
||
## ChatGPT Projects
|
||
|
||
ChatGPT project conversations are stored separately from your main conversation list and require extra configuration.
|
||
|
||
### Finding your project IDs
|
||
|
||
1. Open ChatGPT and click a Project in the left sidebar
|
||
2. Look at the browser URL — it will look like:
|
||
`https://chatgpt.com/g/g-p-68c2b2b3037c8191890036fb4ae3ed9f-my-project/project`
|
||
3. Copy the `g-p-…` part (everything up to but not including the slug after the second `-`)
|
||
|
||
Add all your project IDs to `.env` as a comma-separated list:
|
||
|
||
```
|
||
CHATGPT_PROJECT_IDS=g-p-68c2b2b3037c8191890036fb4ae3ed9f,g-p-anotherprojectid
|
||
```
|
||
|
||
The `auth` wizard can also guide you through this step interactively.
|
||
|
||
---
|
||
|
||
## Output Structure
|
||
|
||
All exported files go under `EXPORT_DIR`. The folder structure maps directly to Joplin notebooks.
|
||
|
||
### Default: `provider/project/year`
|
||
|
||
```
|
||
exports/
|
||
├── chatgpt/
|
||
│ ├── no-project/
|
||
│ │ └── 2024/
|
||
│ │ └── 2024-03-15_my-conversation_abc12345.md
|
||
│ └── learning-python/
|
||
│ └── 2024/
|
||
│ └── 2024-03-15_async-tutorial_def67890.md
|
||
└── claude/
|
||
├── no-project/
|
||
│ └── 2024/
|
||
│ └── 2024-06-01_docker-explained_ghi11111.md
|
||
└── startos-packaging/
|
||
└── 2024/
|
||
└── 2024-06-10_manifest-setup_jkl22222.md
|
||
```
|
||
|
||
### Joplin Notebook Mapping
|
||
|
||
Each provider+project combination maps to a flat Joplin notebook created automatically by the `joplin` command:
|
||
|
||
| Export folder | Joplin notebook |
|
||
|---------------|-----------------|
|
||
| `exports/chatgpt/learning-python/` | `ChatGPT - Learning Python` |
|
||
| `exports/claude/startos-packaging/` | `Claude - Startos Packaging` |
|
||
| `exports/chatgpt/no-project/` | `ChatGPT - No Project` |
|
||
| `exports/claude/no-project/` | `Claude - No Project` |
|
||
|
||
### Other `OUTPUT_STRUCTURE` options
|
||
|
||
| Value | Result |
|
||
|-------|--------|
|
||
| `provider/project/year` (default) | `exports/claude/my-project/2024/file.md` |
|
||
| `provider/project` | `exports/claude/my-project/file.md` |
|
||
| `provider/year` | `exports/claude/2024/file.md` (projects ignored) |
|
||
|
||
### Filename format
|
||
|
||
`YYYY-MM-DD_{title-slug}_{id[:8]}.md` — e.g. `2024-06-10_manifest-setup_jkl22222.md`
|
||
|
||
---
|
||
|
||
## CLI Reference
|
||
|
||
### Global flags
|
||
|
||
```
|
||
--verbose / -v DEBUG output to console
|
||
--quiet / -q WARNING and above only
|
||
--debug DEBUG + full tracebacks + redacted API response bodies
|
||
--no-log-file Disable file logging
|
||
--version Print version and exit
|
||
```
|
||
|
||
### `auth` — Interactive token setup
|
||
|
||
```bash
|
||
ai-chat-exporter auth
|
||
```
|
||
|
||
Guided wizard to find and save session tokens and ChatGPT project IDs. Detects OS and shows the correct DevTools shortcut.
|
||
|
||
### `doctor` — Health check
|
||
|
||
```bash
|
||
ai-chat-exporter doctor
|
||
```
|
||
|
||
Checks: token presence, JWT validity and expiry, directory permissions, disk space, live API reachability. Exits with code 0 if all pass, 1 if any fail.
|
||
|
||
### `export` — Export conversations
|
||
|
||
```bash
|
||
# Export everything (new/updated only)
|
||
ai-chat-exporter export
|
||
|
||
# Single provider
|
||
ai-chat-exporter export --provider claude
|
||
|
||
# JSON output
|
||
ai-chat-exporter export --format json
|
||
|
||
# Both Markdown and JSON
|
||
ai-chat-exporter export --format both
|
||
|
||
# Only conversations updated since a date
|
||
ai-chat-exporter export --since 2024-06-01
|
||
|
||
# Only conversations in a specific project (case-insensitive substring)
|
||
ai-chat-exporter export --project "learning python"
|
||
|
||
# Only conversations outside any project
|
||
ai-chat-exporter export --project none
|
||
|
||
# Write to a custom directory
|
||
ai-chat-exporter export --output /path/to/my/notes
|
||
|
||
# Preview without writing anything
|
||
ai-chat-exporter export --dry-run
|
||
```
|
||
|
||
Options: `--provider [chatgpt|claude|all]`, `--format [markdown|json|both]`, `--output PATH`, `--since YYYY-MM-DD`, `--project NAME`, `--dry-run`
|
||
|
||
### `list` — List conversations
|
||
|
||
```bash
|
||
# List all conversations for all providers
|
||
ai-chat-exporter list
|
||
|
||
# Single provider
|
||
ai-chat-exporter list --provider chatgpt
|
||
|
||
# Filter by project
|
||
ai-chat-exporter list --project "learning python"
|
||
|
||
# Only conversations outside any project
|
||
ai-chat-exporter list --project none
|
||
```
|
||
|
||
Fetches and displays all conversations without exporting them. Useful for verifying what the tool can see before running an export.
|
||
|
||
### `joplin` — Sync to Joplin
|
||
|
||
```bash
|
||
# Sync all pending conversations to Joplin
|
||
ai-chat-exporter joplin
|
||
|
||
# Preview what would be synced without sending anything
|
||
ai-chat-exporter joplin --dry-run
|
||
|
||
# Sync a single provider
|
||
ai-chat-exporter joplin --provider chatgpt
|
||
|
||
# Sync only conversations in a specific project
|
||
ai-chat-exporter joplin --project "learning python"
|
||
|
||
# Sync only conversations outside any project
|
||
ai-chat-exporter joplin --project none
|
||
```
|
||
|
||
Reads the local export cache and pushes each exported Markdown file to Joplin as a note. Notebooks are created automatically. Re-running is safe — notes are updated (not duplicated).
|
||
|
||
**Prerequisites:**
|
||
1. Run `export` first to generate the Markdown files
|
||
2. Open Joplin → Tools → Options → Web Clipper → enable the service
|
||
3. Copy the Authorization token and add `JOPLIN_API_TOKEN=<token>` to your `.env`
|
||
4. Joplin desktop must be open when you run this command
|
||
|
||
Options: `--provider [chatgpt|claude|all]`, `--project NAME`, `--dry-run`
|
||
|
||
### `cache` — Manage the sync manifest
|
||
|
||
```bash
|
||
# Show statistics
|
||
ai-chat-exporter cache --show
|
||
|
||
# Clear all cached entries (forces full re-export next run)
|
||
ai-chat-exporter cache --clear
|
||
|
||
# Clear a single provider
|
||
ai-chat-exporter cache --clear --provider claude
|
||
```
|
||
|
||
---
|
||
|
||
## How the Cache Works
|
||
|
||
The cache manifest lives at `cache/manifest.json` (inside the install directory) and records every exported conversation: its title, project, `updated_at` timestamp, output file path, and (after Joplin sync) the Joplin note ID.
|
||
|
||
On every `export` run:
|
||
1. Fetch the full conversation list from the provider
|
||
2. Compare each conversation's `updated_at` against the manifest
|
||
3. Export only conversations that are new or have been updated
|
||
4. Write each successfully exported conversation to the manifest **immediately** (not batched)
|
||
|
||
On every `joplin` run:
|
||
1. Read the manifest to find conversations not yet synced to Joplin, or re-exported since last sync
|
||
2. Push each pending Markdown file to Joplin (create or update)
|
||
3. Store the Joplin note ID in the manifest so subsequent runs update rather than duplicate
|
||
|
||
**This design makes every run inherently resumable.** If the tool is interrupted for any reason — rate limit, network drop, Ctrl+C, crash — simply re-run the same command. It will skip already-processed conversations and continue from where it stopped.
|
||
|
||
To force a full re-export: `ai-chat-exporter cache --clear` then re-run export.
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### `401 Unauthorized`
|
||
Your session token has expired.
|
||
- Run `ai-chat-exporter auth` to get a new token interactively
|
||
- Or manually copy a fresh cookie value into your `.env` file
|
||
|
||
Note: Claude's `sessionKey` is an opaque string — the only way to know it's expired is the 401 error. ChatGPT JWTs have an `exp` claim that the `doctor` command can decode and display.
|
||
|
||
### `429 Rate Limited`
|
||
The tool automatically pauses, saves progress, and exits with a clear message showing how many conversations were exported vs remaining. Just re-run the same export command to resume — the cache picks up exactly where it left off.
|
||
|
||
### Joplin: "JOPLIN_API_TOKEN is not set"
|
||
You need to configure the token before running the `joplin` command:
|
||
1. Open Joplin desktop
|
||
2. Go to Tools → Options → Web Clipper
|
||
3. Enable the Web Clipper service
|
||
4. Copy the Authorization token shown on that page
|
||
5. Add `JOPLIN_API_TOKEN=<token>` to your `.env` file
|
||
|
||
### Joplin: "Joplin is not responding"
|
||
Joplin desktop must be running when you run the `joplin` command. The Web Clipper service shuts down when Joplin is closed.
|
||
|
||
### Joplin: "Joplin rejected the API token (HTTP 401)"
|
||
The token in `JOPLIN_API_TOKEN` doesn't match what Joplin expects. Get a fresh token from Joplin → Tools → Options → Web Clipper → Authorization token.
|
||
|
||
### Joplin: note timed out
|
||
If you see a timeout error, Joplin took longer than `JOPLIN_REQUEST_TIMEOUT` seconds (default: 30) to respond. Possible causes:
|
||
- The conversation is very large and Joplin is slow to index it
|
||
- Joplin is busy syncing or loading a large library
|
||
- Joplin has frozen — try restarting it
|
||
|
||
To increase the timeout: add `JOPLIN_REQUEST_TIMEOUT=60` to your `.env`.
|
||
|
||
### ChatGPT project conversations not appearing
|
||
Make sure you've added the project IDs to `CHATGPT_PROJECT_IDS` in your `.env`. See [ChatGPT Projects](#chatgpt-projects) for how to find them. Project conversations are not included in the default conversation listing — they must be fetched separately.
|
||
|
||
### Schema warnings in logs (`Unexpected API response shape`)
|
||
The provider's internal API may have changed. Run with `--debug`, sanitize the output (remove any personal content), and check the project's GitHub Issues for known fixes.
|
||
|
||
### Non-text content warnings
|
||
Since v0.4.0, rich content is preserved as typed blocks in the export. ChatGPT voice transcripts render as text and audio assets as `📎 File attached` placeholders with size and duration metadata. Images render as `🖼️ Image attached` placeholders showing the asset reference. Custom Instructions appear under a `> ℹ️ Hidden context` marker. Anything the extractor doesn't recognise renders as a visible `> ⚠️ Unsupported content` block naming the type and observed keys, *and* increments a counter in the post-export summary so you can tell whether real content is being silently skipped. Binary downloads (the actual image/audio bytes) are still deferred — see `FUTURE.md` v0.5.0.
|
||
|
||
### Empty export / all conversations skipped
|
||
No new or updated conversations since your last run. To verify: `ai-chat-exporter cache --show`. To force a full re-export: `ai-chat-exporter cache --clear`.
|
||
|
||
### Filing a bug report
|
||
1. Run with `--debug`: `ai-chat-exporter export --debug 2>&1 | tee debug.log`
|
||
2. Remove any personal conversation content from `debug.log`
|
||
3. Open a GitHub Issue with the sanitized log and the exact command you ran
|
||
|
||
---
|
||
|
||
## Future Work
|
||
|
||
See `FUTURE.md` for planned features:
|
||
|
||
- **v0.2.x** — `export --force` flag; `joplin --force` flag; per-conversation cache reset
|
||
- **v0.3.0** — Official API fallback: parse export ZIP files from ChatGPT/Claude settings
|
||
- **v0.4.x / v0.5.0** — Binary content downloads (images, audio bytes) and Joplin resource upload; reclassify o1/o3 reasoning subparts; optional `EXPORTER_INCLUDE_HIDDEN_CONTEXT` toggle
|
||
- **v0.5.0** — Watch/scheduled mode; Obsidian vault output
|
||
|
||
---
|
||
|
||
## Security Notes
|
||
|
||
- All exported data is stored **locally only** — nothing is sent anywhere except to your local Joplin instance
|
||
- Exported files and the cache manifest are created with `600` permissions (owner read/write only)
|
||
- `.env` is in `.gitignore` — **never commit it**
|
||
- Session tokens are never logged, printed, or included in error messages
|
||
- The Joplin API token is only ever sent to `localhost` — it never leaves your machine
|
||
- If you accidentally commit `.env`: immediately log out and back in to invalidate the token, then remove it from git history using [BFG Repo Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) or `git filter-branch`
|