mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-06-15 23:43:06 +03:00
* 🧹 fix: Prune Orphaned File References on File Deletion Deleting a file via the Manage Files tab left its file_id in every agent's tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys them as duplicates and blocks all new uploads (issue #12776). - Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a single updateMany/$pullAll across every EToolResources category. - Invoke it from processDeleteRequest after db.deleteFiles so every referencing agent is cleaned up, not just the one passed in req.body. - Wrap the cleanup in try/catch so a stale agent update cannot mask a successful file deletion. * 🧼 fix: Prune Orphaned File References on Agent Update Already-affected agents would stay broken even after the delete-time fix: the stubs sit on the agent document until something strips them. Heal them on the next save (issue #12776). - Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers in @librechat/api — centralizing the tool_resources traversal used by the controller and the follow-up migration script. - In updateAgentHandler, check the effective tool_resources against the files collection. When orphans are found, either strip them from the incoming tool_resources (if the update sets them) or run the bulk cleanup (if the update leaves tool_resources untouched). * 🧰 chore: Add Migration to Clean Up Orphaned Agent File References Complements the delete-time and save-time fixes by healing agents that already accumulated orphan stubs before the upgrade (issue #12776). The script is idempotent — re-running it on a clean database is a no-op. - Add config/migrate-orphaned-agent-files.js following the existing migrate-*.js convention: --dry-run by default omitted (writes by default) and --batch-size= tuning knob. Streams agents via cursor. - Register migrate:orphaned-agent-files and :dry-run npm scripts. - Reuse collectToolResourceFileIds from @librechat/api so migration and runtime share the same traversal logic. * 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup Refines the #12776 fix series based on automated review feedback. - Scope save-time pruning to the current agent only. When a PATCH carries tool_resources, strip orphans from the incoming payload and pay the DB round-trip only then. Removes the collection-wide updateMany previously triggered when tool_resources was absent (Codex P2 / Copilot). - Wrap the orphan check in try/catch so a transient db.getFiles failure can't turn a good save into a 500 (comprehensive review #1). - Replace Object.values(EToolResources) casts with an explicit list of agent-side categories in both orphans.ts and agent.ts. code_interpreter belongs to the Assistants API and isn't a key of AgentToolResources — including it was a type lie and generated dead MongoDB clauses (comprehensive review #3, #8). - Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the migration script, dropping one duplicated definition (#4). - Cap migration results.details at 50 sample entries so the memory footprint stays bounded on deployments with thousands of corrupted agents (Codex P3). - Add migrate:orphaned-agent-files:batch npm script to match the convention set by migrate-agent-permissions / migrate-prompt-permissions (#7). - Add controller-level tests covering the three orphan-pruning paths: strip from incoming tool_resources, leave alone when tool_resources is absent, swallow db.getFiles errors and still save (#6). - Back pre-existing "should validate tool_resources in updates" test's file_ids with real File docs — the new pruning would otherwise strip them, and that test is about OCR conversion / schema filtering, not file existence. Register the File model in beforeAll so the fixture works. * 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output Two follow-ups from the second review pass. - Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof AgentToolResources> instead of readonly string[]. Data-schemas depends on data-provider, so the import is clean. Catches typos and aligns with the matching export in @librechat/api — doesn't guarantee exhaustiveness, but that's a TypeScript limitation, not a workspace one. - Align the migration's console output with DETAIL_SAMPLE_LIMIT: print every collected detail (up to 50) and, when more agents were affected than the sample size allowed, show a truncation notice. The old hard cap of 25 meant affected agents in the 26-50 range were collected but never shown. * ✅ test: Add Integration Coverage for Orphan Cleanup Paths (#12776) Exercise the delete-time and migration paths end-to-end against a real in-memory Mongo. Catches integration bugs the isolated unit tests on each layer couldn't. - api/server/services/Files/process.integration.spec.js — the primary repro: seed an Agent + File, call processDeleteRequest, assert the file_id disappears from every referencing agent's tool_resources while unrelated agents stay untouched. Also covers the no-op case and confirms a failure in the new cleanup step cannot roll back the file deletion itself. - api/test/migrate-orphaned-agent-files.spec.js — drives the migration module: --dry-run reports without writing, apply mode prunes across every tool_resource category, re-running is idempotent, and DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption. Mocks only the connect helper (the spec owns the mongoose instance) so the real migration code path — cursor, $pullAll, reduce — runs. * 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration throws on the very first Agent.countDocuments() because the tenant isolation plugin fail-closes on queries without tenant context — which makes migrate:orphaned-agent-files unusable on the exact deployments most likely to have accumulated corruption. - Wrap the scan/prune body in runAsSystem so queries bypass the tenant filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs cross-tenant visibility — this is the same pattern seedDatabase and the S3 refresh job already use. - Add a regression test that spies on Agent.countDocuments() and asserts the active tenantStorage context is SYSTEM_TENANT_ID during the call. Pins the wrap against future regressions without the brittleness of toggling the strict-mode env var (which caches on first read). Note: the delete-time and save-time paths already run inside an authenticated HTTP request where tenantStorage.run is set by auth middleware, so the cleanup naturally scopes to the current tenant — which is the correct behavior there since file ownership is tenant-scoped. * 🧹 chore: Drop Unused path Import From Process Integration Spec Leftover from an earlier iteration that resolved the migration path via path.resolve before I switched to a relative require. The import does nothing now — removing it.
161 lines
5.5 KiB
JavaScript
161 lines
5.5 KiB
JavaScript
const path = require('path');
|
|
const { logger, runAsSystem } = require('@librechat/data-schemas');
|
|
const { TOOL_RESOURCE_KEYS, collectToolResourceFileIds } = require('@librechat/api');
|
|
|
|
require('module-alias')({ base: path.resolve(__dirname, '..', 'api') });
|
|
const connect = require('./connect');
|
|
|
|
const { Agent, File } = require('~/db/models');
|
|
|
|
/**
|
|
* Cap on the number of per-agent entries we retain in `results.details`. Larger
|
|
* runs still update every affected agent and still report accurate aggregate
|
|
* counts — we just stop accumulating sample data past this threshold to keep
|
|
* memory bounded on deployments with thousands of corrupted agents.
|
|
*/
|
|
const DETAIL_SAMPLE_LIMIT = 50;
|
|
|
|
/**
|
|
* Cleans up orphaned file_id references from agent `tool_resources` — that is,
|
|
* file_ids that remain on an agent after the underlying File document has
|
|
* already been deleted (see issue #12776). These stubs otherwise accumulate and
|
|
* eventually block new uploads with "Duplicate file detected."
|
|
*
|
|
* Safe to re-run — if there are no orphans, nothing is written.
|
|
*
|
|
* @param {{ dryRun?: boolean, batchSize?: number }} [options]
|
|
*/
|
|
async function migrateOrphanedAgentFiles({ dryRun = true, batchSize = 100 } = {}) {
|
|
await connect();
|
|
|
|
logger.info('Starting Orphaned Agent Files Migration', { dryRun, batchSize });
|
|
|
|
/*
|
|
* Scan and heal across every tenant. Without this wrapper the tenant
|
|
* isolation plugin either scopes queries to a (non-existent) tenant or
|
|
* throws under TENANT_ISOLATION_STRICT=true, making the script unusable
|
|
* as the intended remediation path for corrupted agents.
|
|
*/
|
|
return runAsSystem(async () => {
|
|
const totalAgents = await Agent.countDocuments();
|
|
logger.info(`Scanning ${totalAgents} agent(s) for orphaned file references`);
|
|
|
|
const results = {
|
|
dryRun,
|
|
scannedAgents: 0,
|
|
agentsWithOrphans: 0,
|
|
agentsUpdated: 0,
|
|
totalOrphansRemoved: 0,
|
|
errors: 0,
|
|
details: [],
|
|
};
|
|
|
|
const cursor = Agent.find({}, { id: 1, name: 1, tool_resources: 1 })
|
|
.lean()
|
|
.cursor({ batchSize });
|
|
|
|
for await (const agent of cursor) {
|
|
results.scannedAgents++;
|
|
|
|
try {
|
|
const referencedFileIds = collectToolResourceFileIds(agent.tool_resources);
|
|
if (referencedFileIds.length === 0) {
|
|
continue;
|
|
}
|
|
|
|
const existing = await File.find(
|
|
{ file_id: { $in: referencedFileIds } },
|
|
{ file_id: 1, _id: 0 },
|
|
).lean();
|
|
const existingIds = new Set(existing.map((f) => f.file_id));
|
|
const orphans = referencedFileIds.filter((id) => !existingIds.has(id));
|
|
if (orphans.length === 0) {
|
|
continue;
|
|
}
|
|
|
|
results.agentsWithOrphans++;
|
|
results.totalOrphansRemoved += orphans.length;
|
|
if (results.details.length < DETAIL_SAMPLE_LIMIT) {
|
|
results.details.push({
|
|
agentId: agent.id,
|
|
name: agent.name,
|
|
orphanCount: orphans.length,
|
|
orphans,
|
|
});
|
|
}
|
|
|
|
if (dryRun) {
|
|
logger.debug(`[dry-run] Would prune ${orphans.length} orphan(s) from agent ${agent.id}`);
|
|
continue;
|
|
}
|
|
|
|
const pullAllOps = {};
|
|
for (const key of TOOL_RESOURCE_KEYS) {
|
|
pullAllOps[`tool_resources.${key}.file_ids`] = orphans;
|
|
}
|
|
const updateResult = await Agent.updateOne({ _id: agent._id }, { $pullAll: pullAllOps });
|
|
if (updateResult.modifiedCount > 0) {
|
|
results.agentsUpdated++;
|
|
logger.info(
|
|
`Pruned ${orphans.length} orphan(s) from agent "${agent.name}" (${agent.id})`,
|
|
);
|
|
}
|
|
} catch (error) {
|
|
results.errors++;
|
|
logger.error(`Failed to process agent ${agent.id}`, { error: error.message });
|
|
}
|
|
}
|
|
|
|
logger.info('Orphaned Agent Files Migration completed', {
|
|
dryRun,
|
|
scannedAgents: results.scannedAgents,
|
|
agentsWithOrphans: results.agentsWithOrphans,
|
|
agentsUpdated: results.agentsUpdated,
|
|
totalOrphansRemoved: results.totalOrphansRemoved,
|
|
errors: results.errors,
|
|
});
|
|
|
|
return results;
|
|
});
|
|
}
|
|
|
|
if (require.main === module) {
|
|
const dryRun = process.argv.includes('--dry-run');
|
|
const batchSize =
|
|
parseInt(process.argv.find((arg) => arg.startsWith('--batch-size='))?.split('=')[1]) || 100;
|
|
|
|
migrateOrphanedAgentFiles({ dryRun, batchSize })
|
|
.then((result) => {
|
|
console.log(`\n=== ${dryRun ? 'DRY RUN ' : ''}RESULTS ===`);
|
|
console.log(`Agents scanned: ${result.scannedAgents}`);
|
|
console.log(`Agents with orphans: ${result.agentsWithOrphans}`);
|
|
console.log(
|
|
`Orphan references ${dryRun ? 'to remove' : 'removed'}: ${result.totalOrphansRemoved}`,
|
|
);
|
|
if (!dryRun) {
|
|
console.log(`Agents updated: ${result.agentsUpdated}`);
|
|
}
|
|
if (result.errors > 0) {
|
|
console.log(`Errors: ${result.errors}`);
|
|
}
|
|
if (result.details.length > 0) {
|
|
console.log('\nAffected agents:');
|
|
result.details.forEach((d, i) => {
|
|
console.log(` ${i + 1}. "${d.name}" (${d.agentId}) — ${d.orphanCount} orphan(s)`);
|
|
});
|
|
if (result.agentsWithOrphans > result.details.length) {
|
|
console.log(
|
|
` ... and ${result.agentsWithOrphans - result.details.length} more (sample capped at ${DETAIL_SAMPLE_LIMIT})`,
|
|
);
|
|
}
|
|
}
|
|
process.exit(0);
|
|
})
|
|
.catch((error) => {
|
|
console.error('Orphaned agent files migration failed:', error);
|
|
process.exit(1);
|
|
});
|
|
}
|
|
|
|
module.exports = { migrateOrphanedAgentFiles };
|