diff --git a/README.md b/README.md new file mode 100644 index 0000000..897fade --- /dev/null +++ b/README.md @@ -0,0 +1,301 @@ +# AI Chat Exporter + +A personal backup tool for ChatGPT and Claude conversation history. Exports your chats to Markdown files structured for archival in [Joplin](https://joplinapp.org/). Each conversation becomes a single `.md` file with YAML frontmatter, organised into folders that map directly to Joplin notebooks. + +Supports incremental sync — only new or updated conversations are exported on each run. Every run is resumable: if interrupted, re-running picks up exactly where it left off. + +--- + +## ⚠️ Terms of Service Warning + +**Read this before using this tool.** + +This tool works by accessing **unofficial, undocumented internal web API endpoints** used by the ChatGPT and Claude web apps. These endpoints are not publicly supported by OpenAI or Anthropic and are subject to change or removal without notice. + +**Use of this tool may conflict with their Terms of Service:** +- OpenAI: https://openai.com/policies/terms-of-use +- Anthropic: https://www.anthropic.com/legal/consumer-terms + +**By using this tool, you accept that:** +- You are using it entirely at your own risk +- Your account could potentially be suspended for automated or scripted access +- The internal APIs this tool relies on may break at any time without notice +- This tool is for **personal archival use only** — not commercial use + +This tool is designed for a single user backing up their own conversations. Do not use it to scrape data at scale or for any commercial purpose. + +--- + +## Installation + +```bash +git clone +cd ai-chat-exporter +python3 -m venv .venv +source .venv/bin/activate +pip install -e ".[dev]" +``` + +--- + +## First Run: Run Doctor + +Before anything else, validate your setup: + +```bash +python -m src.main doctor +``` + +This checks token presence, format, expiry, directory permissions, disk space, and live API connectivity. Fix any failures before proceeding. + +--- + +## Getting Your Session Tokens + +Session tokens are how your browser stays logged in. This tool uses them to access your chat history on your behalf. + +### Token Lifetimes + +| Provider | Cookie Name | Lifetime | Expiry Detection | +|----------|-------------|----------|-----------------| +| ChatGPT | `__Secure-next-auth.session-token` | ~7 days | JWT `exp` claim (decoded automatically) | +| Claude | `sessionKey` | ~30 days | Only detectable via 401 response | + +### Finding Tokens in Chrome DevTools + +1. Open the provider's website and make sure you're logged in +2. Press **F12** (Windows/Linux) or **Cmd+Option+I** (macOS) to open DevTools +3. Click the **Application** tab +4. In the left panel, expand **Cookies** and click the site URL +5. Find the cookie by name and copy its **Value** + +**ChatGPT:** go to `https://chatgpt.com` → find `__Secure-next-auth.session-token` → copy Value (starts with `eyJ`) + +**Claude:** go to `https://claude.ai` → find `sessionKey` → copy Value + +### When Tokens Expire + +When a token expires you'll see a `401 Unauthorized` error. To refresh: +- Re-run the `auth` wizard: `python -m src.main auth` +- Or manually update the value in your `.env` file + +--- + +## The `auth` Command + +The easiest way to configure tokens is the interactive wizard: + +```bash +python -m src.main auth +``` + +This walks you through finding your token, validates it, shows the expiry date (ChatGPT only), and offers to write it to your `.env` automatically. Tokens are never echoed to the terminal. + +--- + +## `.env` Setup + +Copy `.env.example` to `.env` and fill in your values: + +```bash +cp .env.example .env +``` + +| Variable | Default | Description | +|----------|---------|-------------| +| `CHATGPT_SESSION_TOKEN` | — | Your ChatGPT JWT session token | +| `CLAUDE_SESSION_KEY` | — | Your Claude session key | +| `EXPORT_DIR` | `./exports` | Where to write exported files | +| `OUTPUT_STRUCTURE` | `provider/project/year` | Folder structure (see below) | +| `CACHE_DIR` | `~/.ai-chat-exporter` | Where to store the sync manifest | +| `LOG_FILE` | `~/.ai-chat-exporter/logs/exporter.log` | Log file path (`none` to disable) | + +--- + +## Output Structure + +All exported files go under `EXPORT_DIR`. The structure maps to Joplin notebooks. + +### Default: `provider/project/year` + +``` +exports/ +├── chatgpt/ +│ ├── no-project/ +│ │ └── 2024/ +│ │ └── 2024-03-15_my-conversation_abc12345.md +│ └── learning-python/ +│ └── 2024/ +│ └── 2024-03-15_async-tutorial_def67890.md +└── claude/ + ├── no-project/ + │ └── 2024/ + │ └── 2024-06-01_docker-explained_ghi11111.md + └── startos-packaging/ + └── 2024/ + └── 2024-06-10_manifest-setup_jkl22222.md +``` + +### Joplin Notebook Mapping (for future automated import) + +| Export folder | Joplin notebook | +|---------------|-----------------| +| `exports/chatgpt/learning-python/` | `ChatGPT - Learning Python` | +| `exports/claude/startos-packaging/` | `Claude - Startos Packaging` | +| `exports/chatgpt/no-project/` | `ChatGPT - No Project` | +| `exports/claude/no-project/` | `Claude - No Project` | + +### Other `OUTPUT_STRUCTURE` options + +| Value | Result | +|-------|--------| +| `provider/project/year` (default) | `exports/claude/my-project/2024/file.md` | +| `provider/project` | `exports/claude/my-project/file.md` | +| `provider/year` | `exports/claude/2024/file.md` (projects ignored) | + +### Filename format + +`YYYY-MM-DD_{title-slug}_{id[:8]}.md` — e.g. `2024-06-10_manifest-setup_jkl22222.md` + +--- + +## CLI Reference + +### Global flags + +``` +--verbose / -v DEBUG output to console +--quiet / -q WARNING and above only +--debug DEBUG + full tracebacks + redacted API response bodies +--no-log-file Disable file logging +--version Print version and exit +``` + +### `auth` — Interactive token setup + +```bash +python -m src.main auth +``` + +Guided wizard to find and save session tokens. Detects OS and shows the correct DevTools shortcut. + +### `doctor` — Health check + +```bash +python -m src.main doctor +``` + +Checks: token presence, JWT validity and expiry, directory permissions, disk space, live API reachability. Exits with code 0 if all pass, 1 if any fail. + +### `export` — Export conversations + +```bash +# Export everything (new/updated only) +python -m src.main export + +# Single provider +python -m src.main export --provider claude + +# JSON output +python -m src.main export --format json + +# Both Markdown and JSON +python -m src.main export --format both + +# Only conversations updated since a date +python -m src.main export --since 2024-06-01 + +# Write to a custom directory +python -m src.main export --output /path/to/my/notes + +# Preview without writing anything +python -m src.main export --dry-run +``` + +Options: `--provider [chatgpt|claude|all]`, `--format [markdown|json|both]`, `--output PATH`, `--since YYYY-MM-DD`, `--dry-run` + +### `list` — List conversations + +```bash +python -m src.main list --provider chatgpt +``` + +Fetches and displays all conversations without exporting them. + +### `cache` — Manage the sync manifest + +```bash +# Show statistics +python -m src.main cache --show + +# Clear all cached entries (forces full re-export next run) +python -m src.main cache --clear + +# Clear a single provider +python -m src.main cache --clear --provider claude +``` + +--- + +## How the Cache Works + +The cache manifest lives at `~/.ai-chat-exporter/manifest.json` and records every exported conversation: its title, project, `updated_at` timestamp, and output file path. + +On every run: +1. Fetch the full conversation list from the provider +2. Compare each conversation's `updated_at` against the manifest +3. Export only conversations that are new or have been updated +4. Write each successfully exported conversation to the manifest **immediately** (not batched) + +**This design makes every run inherently resumable.** If the tool is interrupted for any reason — rate limit, network drop, Ctrl+C, crash — simply re-run the same command. It will skip already-exported conversations and continue from where it stopped. + +To force a full re-export: `python -m src.main cache --clear` then re-run export. + +--- + +## Troubleshooting + +### `401 Unauthorized` +Your session token has expired. +- Run `python -m src.main auth` to get a new token interactively +- Or manually copy a fresh cookie value into your `.env` file + +Note: Claude's `sessionKey` is an opaque string — the only way to know it's expired is the 401 error. ChatGPT JWTs have an `exp` claim that the `doctor` command can decode and display. + +### `429 Rate Limited` +The tool automatically pauses, saves progress, and exits with a clear message showing how many conversations were exported vs remaining. Just re-run the same export command to resume — the cache picks up exactly where it left off. + +### Schema warnings in logs (`Unexpected API response shape`) +The provider's internal API may have changed. Run with `--debug`, sanitize the output (remove any personal content), and check the project's GitHub Issues for known fixes. + +### Non-text content warnings +Images, code interpreter outputs, DALL-E generations, and Claude artifacts are not exported in v0.1.0. A WARNING is logged for each skipped item. See `FUTURE.md` for the v0.4.0 roadmap. + +### Empty export / all conversations skipped +No new or updated conversations since your last run. To verify: `python -m src.main cache --show`. To force a full re-export: `python -m src.main cache --clear`. + +### Filing a bug report +1. Run with `--debug`: `python -m src.main export --debug 2>&1 | tee debug.log` +2. Remove any personal conversation content from `debug.log` +3. Open a GitHub Issue with the sanitized log and the exact command you ran + +--- + +## Future Work + +See `FUTURE.md` for planned features: + +- **v0.1.x** — `export --force` flag to bypass cache for a single run +- **v0.2.0** — Joplin integration: auto-import exported files via Joplin's local REST API +- **v0.3.0** — Official API fallback: parse export ZIP files from ChatGPT/Claude settings +- **v0.4.0** — Rich content: images, artifacts, code interpreter output, extended thinking + +--- + +## Security Notes + +- All exported data is stored **locally only** — nothing is sent anywhere +- Exported files and the cache manifest are created with `600` permissions (owner read/write only) +- `.env` is in `.gitignore` — **never commit it** +- Session tokens are never logged, printed, or included in error messages +- If you accidentally commit `.env`: immediately log out and back in to invalidate the token, then remove it from git history using [BFG Repo Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) or `git filter-branch`