We added an MCP server to our macOS app (unclutr files) and learned a lot the hard way

I wanted unclutr files (a macOS file cleanup app) to be usable from MCP clients like OpenAI Codex and Claude Code.

The goal was simple:

let users ask AI to scan folders for duplicate files
return structured results
(safely) move selected duplicates to Trash

In practice, getting from “local server works” to “real users can use it” involved a lot of debugging across MCP server implementation, stdio transport behavior, client configuration, macOS sandboxing, and distribution strategy.

This post is a summary of what worked, what failed, and what we changed.

What we built (v1)

We built a Swift-native MCP server for unclutr-files with four tools:

resolve_common_paths — resolves natural-language locations into absolute macOS paths (for example Downloads, Desktop, ~/Documents, external drives).
scan_exact_duplicates — scans folders/files for byte-identical duplicates and returns duplicate groups, reclaimable space, and scan stats.
move_to_trash — moves explicit paths to macOS Trash (not permanent delete).
delete_duplicate_group_except_keep — given a duplicate group plus one keep_path, trashes the rest.

We separated scan and delete on purpose. Scanning stays read-only, deletion is explicit, and the contract is much easier to trust and debug.

Why Swift-native

I chose a Swift-native MCP server because the app is already macOS/Swift, it has direct access to macOS file APIs and Trash integration, and it avoids introducing a Node runtime dependency for end users.

That came with tradeoffs:

fewer MCP examples in Swift compared to Node
more low-level stdio protocol work
less forgiving crash/debug loops than JavaScript

Dev success vs real-user failure

The first version worked from development builds. We could point Codex or Claude at a server binary buried inside Xcode DerivedData (Debug build), which proved the server itself worked.

But that was not a distribution strategy:

App Store users do not have Xcode / DerivedData
DerivedData paths are machine-specific
Debug binaries are not a user-facing install flow

It was an important milestone, but only a milestone.

The big lesson: “server works” != “MCP client can use it”

We repeatedly hit a confusing state where the binary worked when invoked manually and app-side probes worked, but the MCP client still failed during the initialize handshake.

The turning point was splitting debugging into layers:

Layer 1: Binary health

Can the server binary run and answer initialize and tools/list over stdio?

Layer 2: Launcher health

Can a launcher script/binary start the server with the exact path and working directory the client will use?

Layer 3: Client config health

Is Codex/Claude pointing to the correct command, arguments, and working directory?

Layer 4: Client runtime behavior

Does the client send valid framed messages, finish the handshake, and keep the session alive as expected?

Without this separation, it was very easy to “fix” the wrong layer.

The most painful bug

At one point the server was reading stdin in a way that could block or hang under real client launch behavior. Manual runs sometimes looked fine, but client-driven launches still failed.

We changed stdin reading to an async byte stream and added much better logging around request parse, response send, stdin close, and process exit.

That improved reproducibility a lot:

manual probes succeeded consistently
app-side probes succeeded consistently
Codex/Claude integration failures became diagnosable instead of random

The logging also made it obvious which failures were protocol parsing, stdout write failures, process crashes (for example exit 133 / signal 5), or clients closing early.

What Broke

1. It worked in Xcode / DerivedData, but nowhere else

Our first win was pointing the client at a Debug binary inside DerivedData. That proved the server implementation, but it did not prove packaging, launcher behavior, or a real-user setup flow.

Lesson: “works on my dev machine” is not even close to “works for users” for local MCP.

2. Client showed server configured, but tools were unavailable

We had periods where the MCP client UI showed the server entry, but tool calls failed because the server was not actually available.

Lesson: config presence is not the same thing as a successful initialize + tools/list handshake.

3. Handshake failures with initialize response connection closed

One of the most confusing phases looked like this: manual/probe runs succeeded, but Codex or Claude failed during initialize with the connection closed.

Lesson: client launch behavior can expose stdio issues that a manual shell run never hits.

4. Stdio read/write behavior caused non-obvious failures

We saw cases where the server parsed requests but failed sending responses. Logs showed initialize received, followed by send failures, closed pipes, or intermittent crashes/retries.

Lesson: add explicit logging around request parsed, response write success/failure, stdin close, and exit code/signal. Without it, you are blind.

5. Launcher existed but was not executable

We hit repeated states where the launcher or companion file existed but was not executable.

Lesson: diagnostics should report launchability and executable-bit status, not just file existence.

6. Sandbox/container copy looked fine, but execution failed

This was the big reality check for a Mac App Store style approach. The companion file existed and permissions looked correct, but launching from the sandbox container still failed with Operation not permitted (exit 126).

Lesson: macOS sandbox/runtime restrictions matter more than file presence or chmod state.

7. App Store archive validation surfaced helper/runtime issues

Archive/validation also surfaced problems around embedded executables, sandbox entitlements, hardened runtime expectations, and helper dSYM behavior.

Lesson: distribution constraints need to be part of architecture decisions early.

8. UI complexity became a debugging blocker

At one point we showed too much raw debug info by default, which made the state harder to understand quickly.

Fix: simplify the default UI, keep self-test/probe front and center, and move deeper diagnostics into collapsible sections.

Lesson: debugging UX is product work.

9. Deletion needed a better contract than “scan tool also deletes”

We intentionally avoided mixing scanning and deletion. Instead we added dedicated move_to_trash and delete_duplicate_group_except_keep tools.

Lesson: explicit, small, safe tool contracts are easier to trust and maintain.

App Store reality check

We also explored shipping MCP support through the Mac App Store build. The main problem was not whether the binary could exist in the app bundle, but whether it could be executed reliably under sandbox and distribution constraints.

We hit issues like:

executable permissions after copying to the app container
Operation not permitted when launching a copied companion from inside the sandbox container
App Store validation issues around embedded executables, entitlements, and runtime expectations

Where we landed (for now): MAS build: no MCP feature (UI hidden/disabled). Direct download build: MCP-enabled with bundled companion and setup UI.

This was the pragmatic way to ship something reliable.

Distribution strategy that actually worked

For the direct-download build, the workable setup was:

bundle a standalone MCP companion binary in app resources
install or repair a local launcher
auto-generate Codex config pointing to a stable command
include in-app probe/self-test tools
expose runtime diagnostics in a debug section

This removed Xcode DerivedData from the user flow and made setup something real users could actually complete.

UX lessons

The MCP feature was not just backend work. A lot of the real effort ended up in UX and diagnostics.

We added an AI tools (MCP) settings card with:

clear status (ready vs setup required)
one-click install/repair
one-click apply Codex config
self-test and probe actions
debug details (collapsed by default)
explicit limitations and requirements

Why this mattered: when MCP fails, users usually cannot tell whether the problem is the app, the server, the launcher, the client config, or the client itself. Good diagnostics cut support and debugging time dramatically.

Safety model for file actions (and why)

We intentionally separated scan from delete before exposing the tools to AI clients.

Deletion safeguards:

explicit deletion tool call required
absolute paths only
explicit keep_path for duplicate-group deletion
dry_run support
require_exists support
move to Trash (recoverable), not hard delete

If you are building file-system MCP tools, I strongly recommend this pattern.

End-to-end example

This is the core workflow a user (or Codex/Claude acting for the user) can follow to clean duplicates safely with unclutr files MCP:

1. Resolve “Downloads” to an absolute path

{
  "name": "resolve_common_paths",
  "arguments": {
    "terms": ["Downloads"]
  }
}

Expected shape of result:

{
  "resolved_count": 1,
  "resolved": [
    {
      "term": "Downloads",
      "path": "/Users/you/Downloads",
      "kind": "directory",
      "exists": true
    }
  ],
  "unresolved_terms": []
}

2. Scan for exact duplicates

{
  "name": "scan_exact_duplicates",
  "arguments": {
    "paths": ["/Users/you/Downloads"],
    "recursive": true,
    "include_hidden": false,
    "follow_symlinks": false,
    "max_groups": 20
  }
}

One duplicate group in the result may look like:

{
  "files": [
    "/Users/you/Downloads/report.pdf",
    "/Users/you/Downloads/report copy.pdf",
    "/Users/you/Downloads/report (1).pdf"
  ],
  "file_count": 3,
  "reclaimable_bytes": 245760
}

3. Dry-run deletion

{
  "name": "delete_duplicate_group_except_keep",
  "arguments": {
    "paths": [
      "/Users/you/Downloads/report.pdf",
      "/Users/you/Downloads/report copy.pdf",
      "/Users/you/Downloads/report (1).pdf"
    ],
    "keep_path": "/Users/you/Downloads/report.pdf",
    "dry_run": true
  }
}

Review what would be trashed.

4. Actual deletion

{
  "name": "delete_duplicate_group_except_keep",
  "arguments": {
    "paths": [
      "/Users/you/Downloads/report.pdf",
      "/Users/you/Downloads/report copy.pdf",
      "/Users/you/Downloads/report (1).pdf"
    ],
    "keep_path": "/Users/you/Downloads/report.pdf",
    "dry_run": false
  }
}

That is the core UX: resolve → scan → choose keep → trash others.

Technical Appendix: What made this work

MCP transport choice

We implemented a local stdio MCP server (Swift-native) instead of an HTTP server.

Why stdio:

fits Codex/Claude local MCP expectations well
no port management
simpler local trust model
easy for a desktop app to launch and probe directly

Core server capabilities

Current tools:

resolve_common_paths — natural-language macOS path resolution
scan_exact_duplicates — byte-identical duplicate detection
move_to_trash — explicit trash action for absolute paths
delete_duplicate_group_except_keep — keep one, trash the rest

Debugging model

We stopped treating failures as “MCP is broken” and used the layered model (binary health, launcher health, client config health, client runtime behavior). That is what made debugging tractable.

The handshake bug we hit

We had a stretch where manual runs worked sometimes, app probe worked, and Codex/Claude still failed during initialize. The root issue was stdin handling under real client launch patterns. Moving to async byte-stream stdin and improving send/logging made behavior much more consistent and debuggable.

Probe strategy

We added an in-app “Probe configured MCP server” action that launches the exact configured command, sends initialize + tools/list, captures stdout/stderr, and shows exit code plus timeout status in the UI.

This was one of the highest-leverage additions in the whole project.

Packaging: dev vs real user

Dev-only setup (worked first): MCP server path inside Xcode DerivedData app build.

Real-user setup (direct download build): bundled companion in app resources, launcher install/repair, auto-apply Codex config, and in-app self-test/probe/diagnostics.

This removed machine-specific Xcode paths from the user flow.

App Store vs direct distribution

We investigated MAS-friendly MCP delivery, but practical sandbox/runtime constraints made it unreliable for now. Current strategy is simple: MAS build: no MCP feature. Direct build: MCP-enabled.

This is a product/distribution decision as much as a technical one.

What I would do earlier next time

Build a tiny probe harness first (initialize + tools/list only) with clear stdout/stderr logging.
Add layered diagnostics from day one (binary health, launcher health, client config health).
Decide distribution strategy earlier (MAS vs direct constraints change architecture).
Keep tool contracts small and explicit (scan-only and delete-only, no hidden side effects).

What’s next for unclutr files MCP

Likely next steps:

better duplicate-group filtering (for example ignoring .app bundles or package contents)
smarter “recommend keep file” heuristics (newest, original name, path preference)
batch cleanup flows with confirmation summaries
support for more MCP clients with setup presets
opt-in telemetry / debug export for support cases

If you’re building an MCP server for a desktop app

My practical advice:

prove the server first
then prove the launcher
then prove one client
then design the user setup flow
treat diagnostics as a feature, not an afterthought

The protocol part is usually the easy part. Packaging, permissions, and supportability are where the real work starts.