Files
LibreChat/config/migrate-orphaned-agent-files.js
Danny Avila 181d705579 🧹 fix: Clean Up Orphaned Agent File Stubs After Deletion (#12781)
* 🧹 fix: Prune Orphaned File References on File Deletion

Deleting a file via the Manage Files tab left its file_id in every agent's
tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys
them as duplicates and blocks all new uploads (issue #12776).

- Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a
  single updateMany/$pullAll across every EToolResources category.
- Invoke it from processDeleteRequest after db.deleteFiles so every
  referencing agent is cleaned up, not just the one passed in req.body.
- Wrap the cleanup in try/catch so a stale agent update cannot mask a
  successful file deletion.

* 🧼 fix: Prune Orphaned File References on Agent Update

Already-affected agents would stay broken even after the delete-time fix:
the stubs sit on the agent document until something strips them. Heal them
on the next save (issue #12776).

- Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers
  in @librechat/api — centralizing the tool_resources traversal used by
  the controller and the follow-up migration script.
- In updateAgentHandler, check the effective tool_resources against the
  files collection. When orphans are found, either strip them from the
  incoming tool_resources (if the update sets them) or run the bulk
  cleanup (if the update leaves tool_resources untouched).

* 🧰 chore: Add Migration to Clean Up Orphaned Agent File References

Complements the delete-time and save-time fixes by healing agents that
already accumulated orphan stubs before the upgrade (issue #12776). The
script is idempotent — re-running it on a clean database is a no-op.

- Add config/migrate-orphaned-agent-files.js following the existing
  migrate-*.js convention: --dry-run by default omitted (writes by
  default) and --batch-size= tuning knob. Streams agents via cursor.
- Register migrate:orphaned-agent-files and :dry-run npm scripts.
- Reuse collectToolResourceFileIds from @librechat/api so migration and
  runtime share the same traversal logic.

* 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup

Refines the #12776 fix series based on automated review feedback.

- Scope save-time pruning to the current agent only. When a PATCH
  carries tool_resources, strip orphans from the incoming payload and
  pay the DB round-trip only then. Removes the collection-wide
  updateMany previously triggered when tool_resources was absent
  (Codex P2 / Copilot).
- Wrap the orphan check in try/catch so a transient db.getFiles
  failure can't turn a good save into a 500 (comprehensive review #1).
- Replace Object.values(EToolResources) casts with an explicit list of
  agent-side categories in both orphans.ts and agent.ts. code_interpreter
  belongs to the Assistants API and isn't a key of AgentToolResources —
  including it was a type lie and generated dead MongoDB clauses
  (comprehensive review #3, #8).
- Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the
  migration script, dropping one duplicated definition (#4).
- Cap migration results.details at 50 sample entries so the memory
  footprint stays bounded on deployments with thousands of corrupted
  agents (Codex P3).
- Add migrate:orphaned-agent-files:batch npm script to match the
  convention set by migrate-agent-permissions / migrate-prompt-permissions
  (#7).
- Add controller-level tests covering the three orphan-pruning paths:
  strip from incoming tool_resources, leave alone when tool_resources
  is absent, swallow db.getFiles errors and still save (#6).
- Back pre-existing "should validate tool_resources in updates" test's
  file_ids with real File docs — the new pruning would otherwise strip
  them, and that test is about OCR conversion / schema filtering, not
  file existence. Register the File model in beforeAll so the fixture
  works.

* 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output

Two follow-ups from the second review pass.

- Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof
  AgentToolResources> instead of readonly string[]. Data-schemas depends
  on data-provider, so the import is clean. Catches typos and aligns
  with the matching export in @librechat/api — doesn't guarantee
  exhaustiveness, but that's a TypeScript limitation, not a workspace
  one.
- Align the migration's console output with DETAIL_SAMPLE_LIMIT: print
  every collected detail (up to 50) and, when more agents were affected
  than the sample size allowed, show a truncation notice. The old hard
  cap of 25 meant affected agents in the 26-50 range were collected
  but never shown.

*  test: Add Integration Coverage for Orphan Cleanup Paths (#12776)

Exercise the delete-time and migration paths end-to-end against a real
in-memory Mongo. Catches integration bugs the isolated unit tests on
each layer couldn't.

- api/server/services/Files/process.integration.spec.js — the primary
  repro: seed an Agent + File, call processDeleteRequest, assert the
  file_id disappears from every referencing agent's tool_resources
  while unrelated agents stay untouched. Also covers the no-op case
  and confirms a failure in the new cleanup step cannot roll back the
  file deletion itself.
- api/test/migrate-orphaned-agent-files.spec.js — drives the migration
  module: --dry-run reports without writing, apply mode prunes across
  every tool_resource category, re-running is idempotent, and
  DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption.
  Mocks only the connect helper (the spec owns the mongoose instance)
  so the real migration code path — cursor, $pullAll, reduce — runs.

* 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context

Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration
throws on the very first Agent.countDocuments() because the tenant
isolation plugin fail-closes on queries without tenant context — which
makes migrate:orphaned-agent-files unusable on the exact deployments
most likely to have accumulated corruption.

- Wrap the scan/prune body in runAsSystem so queries bypass the tenant
  filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs
  cross-tenant visibility — this is the same pattern seedDatabase and
  the S3 refresh job already use.
- Add a regression test that spies on Agent.countDocuments() and
  asserts the active tenantStorage context is SYSTEM_TENANT_ID during
  the call. Pins the wrap against future regressions without the
  brittleness of toggling the strict-mode env var (which caches on
  first read).

Note: the delete-time and save-time paths already run inside an
authenticated HTTP request where tenantStorage.run is set by auth
middleware, so the cleanup naturally scopes to the current tenant —
which is the correct behavior there since file ownership is
tenant-scoped.

* 🧹 chore: Drop Unused path Import From Process Integration Spec

Leftover from an earlier iteration that resolved the migration path
via path.resolve before I switched to a relative require. The import
does nothing now — removing it.
2026-04-22 11:35:48 -07:00

161 lines
5.5 KiB
JavaScript

const path = require('path');
const { logger, runAsSystem } = require('@librechat/data-schemas');
const { TOOL_RESOURCE_KEYS, collectToolResourceFileIds } = require('@librechat/api');
require('module-alias')({ base: path.resolve(__dirname, '..', 'api') });
const connect = require('./connect');
const { Agent, File } = require('~/db/models');
/**
* Cap on the number of per-agent entries we retain in `results.details`. Larger
* runs still update every affected agent and still report accurate aggregate
* counts — we just stop accumulating sample data past this threshold to keep
* memory bounded on deployments with thousands of corrupted agents.
*/
const DETAIL_SAMPLE_LIMIT = 50;
/**
* Cleans up orphaned file_id references from agent `tool_resources` — that is,
* file_ids that remain on an agent after the underlying File document has
* already been deleted (see issue #12776). These stubs otherwise accumulate and
* eventually block new uploads with "Duplicate file detected."
*
* Safe to re-run — if there are no orphans, nothing is written.
*
* @param {{ dryRun?: boolean, batchSize?: number }} [options]
*/
async function migrateOrphanedAgentFiles({ dryRun = true, batchSize = 100 } = {}) {
await connect();
logger.info('Starting Orphaned Agent Files Migration', { dryRun, batchSize });
/*
* Scan and heal across every tenant. Without this wrapper the tenant
* isolation plugin either scopes queries to a (non-existent) tenant or
* throws under TENANT_ISOLATION_STRICT=true, making the script unusable
* as the intended remediation path for corrupted agents.
*/
return runAsSystem(async () => {
const totalAgents = await Agent.countDocuments();
logger.info(`Scanning ${totalAgents} agent(s) for orphaned file references`);
const results = {
dryRun,
scannedAgents: 0,
agentsWithOrphans: 0,
agentsUpdated: 0,
totalOrphansRemoved: 0,
errors: 0,
details: [],
};
const cursor = Agent.find({}, { id: 1, name: 1, tool_resources: 1 })
.lean()
.cursor({ batchSize });
for await (const agent of cursor) {
results.scannedAgents++;
try {
const referencedFileIds = collectToolResourceFileIds(agent.tool_resources);
if (referencedFileIds.length === 0) {
continue;
}
const existing = await File.find(
{ file_id: { $in: referencedFileIds } },
{ file_id: 1, _id: 0 },
).lean();
const existingIds = new Set(existing.map((f) => f.file_id));
const orphans = referencedFileIds.filter((id) => !existingIds.has(id));
if (orphans.length === 0) {
continue;
}
results.agentsWithOrphans++;
results.totalOrphansRemoved += orphans.length;
if (results.details.length < DETAIL_SAMPLE_LIMIT) {
results.details.push({
agentId: agent.id,
name: agent.name,
orphanCount: orphans.length,
orphans,
});
}
if (dryRun) {
logger.debug(`[dry-run] Would prune ${orphans.length} orphan(s) from agent ${agent.id}`);
continue;
}
const pullAllOps = {};
for (const key of TOOL_RESOURCE_KEYS) {
pullAllOps[`tool_resources.${key}.file_ids`] = orphans;
}
const updateResult = await Agent.updateOne({ _id: agent._id }, { $pullAll: pullAllOps });
if (updateResult.modifiedCount > 0) {
results.agentsUpdated++;
logger.info(
`Pruned ${orphans.length} orphan(s) from agent "${agent.name}" (${agent.id})`,
);
}
} catch (error) {
results.errors++;
logger.error(`Failed to process agent ${agent.id}`, { error: error.message });
}
}
logger.info('Orphaned Agent Files Migration completed', {
dryRun,
scannedAgents: results.scannedAgents,
agentsWithOrphans: results.agentsWithOrphans,
agentsUpdated: results.agentsUpdated,
totalOrphansRemoved: results.totalOrphansRemoved,
errors: results.errors,
});
return results;
});
}
if (require.main === module) {
const dryRun = process.argv.includes('--dry-run');
const batchSize =
parseInt(process.argv.find((arg) => arg.startsWith('--batch-size='))?.split('=')[1]) || 100;
migrateOrphanedAgentFiles({ dryRun, batchSize })
.then((result) => {
console.log(`\n=== ${dryRun ? 'DRY RUN ' : ''}RESULTS ===`);
console.log(`Agents scanned: ${result.scannedAgents}`);
console.log(`Agents with orphans: ${result.agentsWithOrphans}`);
console.log(
`Orphan references ${dryRun ? 'to remove' : 'removed'}: ${result.totalOrphansRemoved}`,
);
if (!dryRun) {
console.log(`Agents updated: ${result.agentsUpdated}`);
}
if (result.errors > 0) {
console.log(`Errors: ${result.errors}`);
}
if (result.details.length > 0) {
console.log('\nAffected agents:');
result.details.forEach((d, i) => {
console.log(` ${i + 1}. "${d.name}" (${d.agentId}) — ${d.orphanCount} orphan(s)`);
});
if (result.agentsWithOrphans > result.details.length) {
console.log(
` ... and ${result.agentsWithOrphans - result.details.length} more (sample capped at ${DETAIL_SAMPLE_LIMIT})`,
);
}
}
process.exit(0);
})
.catch((error) => {
console.error('Orphaned agent files migration failed:', error);
process.exit(1);
});
}
module.exports = { migrateOrphanedAgentFiles };