Asset-Frameworker/.lh/Tickets/FEAT-004-handle-multi-asset-inputs.md.json
2025-04-29 18:26:13 +02:00

22 lines
18 KiB
JSON

{
"sourceFile": "Tickets/FEAT-004-handle-multi-asset-inputs.md",
"activeCommit": 0,
"commits": [
{
"activePatchIndex": 1,
"patches": [
{
"date": 1745315823241,
"content": "Index: \n===================================================================\n--- \n+++ \n"
},
{
"date": 1745316389879,
"content": "Index: \n===================================================================\n--- \n+++ \n@@ -0,0 +1,162 @@\n+---\r\n+ID: FEAT-004\r\n+Type: Feature\r\n+Status: Planned\r\n+Priority: Medium\r\n+Labels: [core, gui, cli, feature, enhancement]\r\n+Created: 2025-04-22\r\n+Updated: 2025-04-22\r\n+Related: #ISSUE-001\r\n+---\r\n+\r\n+# [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index\r\n+\r\n+## Description\r\n+Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.\r\n+\r\n+## Current Behavior\r\n+When processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., \"3-\"). The processing pipeline and GUI prediction then proceed using this incorrect name.\r\n+\r\n+## Desired Behavior / Goals\r\n+The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.\r\n+\r\n+## Implementation Notes (Optional)\r\n+* Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names.\r\n+* Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`.\r\n+* For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name.\r\n+* Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.\r\n+* Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display.\r\n+* Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).\r\n+\r\n+## Acceptance Criteria (Optional)\r\n+* [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`<output_base>/<supplier>/AssetA/` and `<output_base>/<supplier>/AssetB/`), each containing the correctly processed files and metadata for that asset.\r\n+* [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).\r\n+* [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately.\r\n+* [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').\r\n+---\r\n+## Implementation Plan (Generated by Architect Mode)\r\n+\r\n+**Goal:** Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the `source_naming.part_indices.base_name` rule, placing unmatched files into the `Extra/` folder of each processed asset.\r\n+\r\n+**Phase 1: Core Logic Refactoring (`asset_processor.py`)**\r\n+\r\n+1. **Refactor `_determine_base_metadata`:**\r\n+ * **Input:** Takes the list of all file paths (relative to temp dir) found after extraction.\r\n+ * **Logic:**\r\n+ * Iterates through relevant file stems (maps, models).\r\n+ * Uses the `source_naming_separator` and `source_naming_indices['base_name']` to extract potential base names for each file stem.\r\n+ * Identifies the set of *distinct* base names found across all files.\r\n+ * Creates a mapping: `Dict[Path, Optional[str]]` where keys are relative file paths and values are the determined base name string (or `None` if a file doesn't match any base name according to the index rule).\r\n+ * **Output:** Returns a tuple: `(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]])`.\r\n+ * **Remove:** Logic setting `self.metadata[\"asset_name\"]`, `asset_category`, and `archetype`.\r\n+\r\n+2. **Create New Method `_determine_single_asset_metadata`:**\r\n+ * **Input:** Takes a specific `asset_base_name` (string) and the list of `classified_files` *filtered* for that asset.\r\n+ * **Logic:** Contains the logic previously in `_determine_base_metadata` for determining `asset_category` and `archetype` based *only* on the files associated with the given `asset_base_name`.\r\n+ * **Output:** Returns a dictionary containing `{\"asset_category\": str, \"archetype\": str}` for the specific asset.\r\n+\r\n+3. **Modify `_inventory_and_classify_files`:**\r\n+ * No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the `classified_files` structure remains suitable for later filtering.\r\n+\r\n+4. **Refactor `AssetProcessor.process` Method:**\r\n+ * Change the overall flow to handle multiple assets.\r\n+ * **Steps:**\r\n+ 1. `_setup_workspace()`\r\n+ 2. `_extract_input()`\r\n+ 3. `_inventory_and_classify_files()` -> Get initial `self.classified_files` (all files).\r\n+ 4. Call the *new* `_determine_base_metadata()` using all relevant files -> Get `distinct_base_names` list and `file_to_base_name_map`.\r\n+ 5. Initialize an overall status dictionary (e.g., `{\"processed\": [], \"skipped\": [], \"failed\": []}`).\r\n+ 6. **Loop** through each `current_asset_name` in `distinct_base_names`:\r\n+ * Log the start of processing for `current_asset_name`.\r\n+ * **Filter Files:** Create temporary filtered lists of maps, models, etc., from `self.classified_files` based on the `file_to_base_name_map` for the `current_asset_name`.\r\n+ * **Determine Metadata:** Call `_determine_single_asset_metadata(current_asset_name, filtered_files)` -> Get category/archetype for this asset. Store these along with `current_asset_name` and supplier name in a temporary `current_asset_metadata` dict.\r\n+ * **Skip Check:** Perform the skip check logic specifically for `current_asset_name` using the `output_base_path`, supplier name, and `current_asset_name`. If skipped, update overall status and `continue` to the next asset name.\r\n+ * **Process:** Call `_process_maps()`, `_merge_maps()`, passing the *filtered* file lists and potentially the `current_asset_metadata`. These methods need to operate only on the provided subset of files.\r\n+ * **Generate Metadata:** Call `_generate_metadata_file()`, passing the `current_asset_metadata` and the results from map/merge processing for *this asset*. This method will now write `metadata.json` specific to `current_asset_name`.\r\n+ * **Organize Output:** Call `_organize_output_files()`, passing the `current_asset_name`. This method needs modification:\r\n+ * It will move the processed files for the *current asset* to the correct subfolder (`<output_base>/<supplier>/<current_asset_name>/`).\r\n+ * It will also identify files from the *original* input whose base name was `None` in the `file_to_base_name_map` (the \"unmatched\" files).\r\n+ * It will copy these \"unmatched\" files into the `Extra/` subfolder for the *current asset being processed in this loop iteration*.\r\n+ * Update overall status based on the success/failure of this asset's processing.\r\n+ 7. `_cleanup_workspace()` (only after processing all assets from the input).\r\n+ 8. **Return:** Return the overall status dictionary summarizing results across all detected assets.\r\n+\r\n+5. **Adapt `_process_maps`, `_merge_maps`, `_generate_metadata_file`, `_organize_output_files`:**\r\n+ * Ensure these methods accept and use the filtered file lists and the specific `asset_name` for the current iteration.\r\n+ * `_organize_output_files` needs the logic to handle copying the \"unmatched\" files into the current asset's `Extra/` folder.\r\n+\r\n+**Phase 2: Update Orchestration (`main.py`, `gui/processing_handler.py`)**\r\n+\r\n+1. **Modify `main.process_single_asset_wrapper`:**\r\n+ * The call `processor.process()` will now return the overall status dictionary.\r\n+ * The wrapper needs to interpret this dictionary to return a single representative status (\"processed\" if any succeeded, \"skipped\" if all skipped, \"failed\" if any failed) and potentially a consolidated error message for the main loop/GUI.\r\n+\r\n+2. **Modify `gui.processing_handler.ProcessingHandler.run`:**\r\n+ * No major changes needed here, as it relies on `process_single_asset_wrapper`. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.\r\n+\r\n+**Phase 3: Update GUI Prediction (`asset_processor.py`, `gui/prediction_handler.py`, `gui/main_window.py`)**\r\n+\r\n+1. **Modify `AssetProcessor.get_detailed_file_predictions`:**\r\n+ * This method must now perform the multi-asset detection:\r\n+ * Call the refactored `_determine_base_metadata` to get the `distinct_base_names` and `file_to_base_name_map`.\r\n+ * Iterate through all classified files (maps, models, extra, ignored).\r\n+ * For each file, look up its corresponding base name in the `file_to_base_name_map`.\r\n+ * The returned dictionary for each file should now include:\r\n+ * `original_path`: str\r\n+ * `predicted_asset_name`: str | None (The base name determined for this file, or None if unmatched)\r\n+ * `predicted_output_name`: str | None (The predicted final filename, e.g., `AssetName_Color_4K.png`, or original name for models/extra)\r\n+ * `status`: str (\"Mapped\", \"Model\", \"Extra\", \"Unrecognised\", \"Ignored\", **\"Unmatched Extra\"** - new status for files with `None` base name).\r\n+ * `details`: str | None\r\n+\r\n+2. **Update `gui.prediction_handler.PredictionHandler`:**\r\n+ * Ensure it correctly passes the results from `get_detailed_file_predictions` (including the new `predicted_asset_name` and `status` values) back to the main window via signals.\r\n+\r\n+3. **Update `gui.main_window.MainWindow`:**\r\n+ * Modify the preview table model/delegate to display the `predicted_asset_name`. A new column might be needed.\r\n+ * Update the logic that colors rows or displays status icons to handle the new \"Unmatched Extra\" status distinctly from regular \"Extra\" or \"Unrecognised\".\r\n+\r\n+**Visual Plan (`AssetProcessor.process` Sequence)**\r\n+\r\n+```mermaid\r\n+sequenceDiagram\r\n+ participant Client as Orchestrator (main.py / GUI Handler)\r\n+ participant AP as AssetProcessor\r\n+ participant Config as Configuration\r\n+ participant FS as File System\r\n+\r\n+ Client->>AP: process(input_path, config, output_base, overwrite)\r\n+ AP->>AP: _setup_workspace()\r\n+ AP->>FS: Create temp_dir\r\n+ AP->>AP: _extract_input()\r\n+ AP->>FS: Extract/Copy files to temp_dir\r\n+ AP->>AP: _inventory_and_classify_files()\r\n+ AP-->>AP: self.classified_files (all files)\r\n+ AP->>AP: _determine_base_metadata()\r\n+ AP-->>AP: distinct_base_names, file_to_base_name_map\r\n+\r\n+ AP->>AP: Initialize overall_status = {}\r\n+ loop For each current_asset_name in distinct_base_names\r\n+ AP->>AP: Log start for current_asset_name\r\n+ AP->>AP: Filter self.classified_files using file_to_base_name_map\r\n+ AP-->>AP: filtered_files_for_asset\r\n+ AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)\r\n+ AP-->>AP: current_asset_metadata (category, archetype)\r\n+ AP->>AP: Perform Skip Check for current_asset_name\r\n+ alt Skip Check == True\r\n+ AP->>AP: Update overall_status (skipped)\r\n+ AP->>AP: continue loop\r\n+ end\r\n+ AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)\r\n+ AP-->>AP: processed_map_details_asset\r\n+ AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)\r\n+ AP-->>AP: merged_map_details_asset\r\n+ AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)\r\n+ AP->>FS: Write metadata.json for current_asset_name\r\n+ AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)\r\n+ AP->>FS: Move processed files for current_asset_name\r\n+ AP->>FS: Copy unmatched files to Extra/ for current_asset_name\r\n+ AP->>AP: Update overall_status (processed/failed for this asset)\r\n+ end\r\n+ AP->>AP: _cleanup_workspace()\r\n+ AP->>FS: Delete temp_dir\r\n+ AP-->>Client: Return overall_status dictionary\n\\ No newline at end of file\n"
}
],
"date": 1745315823241,
"name": "Commit-0",
"content": "---\r\nID: FEAT-004\r\nType: Feature\r\nStatus: Planned\r\nPriority: Medium\r\nLabels: [core, gui, cli, feature, enhancement]\r\nCreated: 2025-04-22\r\nUpdated: 2025-04-22\r\nRelated: #ISSUE-001\r\n---\r\n\r\n# [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index\r\n\r\n## Description\r\nCurrently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.\r\n\r\n## Current Behavior\r\nWhen processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., \"3-\"). The processing pipeline and GUI prediction then proceed using this incorrect name.\r\n\r\n## Desired Behavior / Goals\r\nThe tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.\r\n\r\n## Implementation Notes (Optional)\r\n* Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names.\r\n* Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`.\r\n* For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name.\r\n* Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.\r\n* Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display.\r\n* Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).\r\n\r\n## Acceptance Criteria (Optional)\r\n* [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`<output_base>/<supplier>/AssetA/` and `<output_base>/<supplier>/AssetB/`), each containing the correctly processed files and metadata for that asset.\r\n* [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).\r\n* [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately.\r\n* [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra')."
}
]
}