Asset-Frameworker/Tickets/Resolved/FEAT-004-handle-multi-asset-inputs.md
2025-04-29 18:26:13 +02:00

162 lines
13 KiB
Markdown

---
ID: FEAT-004
Type: Feature
Status: Complete
Priority: Medium
Labels: [core, gui, cli, feature, enhancement]
Created: 2025-04-22
Updated: 2025-04-22
Related: #ISSUE-001
---
# [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index
## Description
Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.
## Current Behavior
When processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name.
## Desired Behavior / Goals
The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.
## Implementation Notes (Optional)
* Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names.
* Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`.
* For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name.
* Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.
* Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display.
* Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).
## Acceptance Criteria (Optional)
* [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`<output_base>/<supplier>/AssetA/` and `<output_base>/<supplier>/AssetB/`), each containing the correctly processed files and metadata for that asset.
* [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).
* [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately.
* [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').
---
## Implementation Plan (Generated by Architect Mode)
**Goal:** Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the `source_naming.part_indices.base_name` rule, placing unmatched files into the `Extra/` folder of each processed asset.
**Phase 1: Core Logic Refactoring (`asset_processor.py`)**
1. **Refactor `_determine_base_metadata`:**
* **Input:** Takes the list of all file paths (relative to temp dir) found after extraction.
* **Logic:**
* Iterates through relevant file stems (maps, models).
* Uses the `source_naming_separator` and `source_naming_indices['base_name']` to extract potential base names for each file stem.
* Identifies the set of *distinct* base names found across all files.
* Creates a mapping: `Dict[Path, Optional[str]]` where keys are relative file paths and values are the determined base name string (or `None` if a file doesn't match any base name according to the index rule).
* **Output:** Returns a tuple: `(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]])`.
* **Remove:** Logic setting `self.metadata["asset_name"]`, `asset_category`, and `archetype`.
2. **Create New Method `_determine_single_asset_metadata`:**
* **Input:** Takes a specific `asset_base_name` (string) and the list of `classified_files` *filtered* for that asset.
* **Logic:** Contains the logic previously in `_determine_base_metadata` for determining `asset_category` and `archetype` based *only* on the files associated with the given `asset_base_name`.
* **Output:** Returns a dictionary containing `{"asset_category": str, "archetype": str}` for the specific asset.
3. **Modify `_inventory_and_classify_files`:**
* No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the `classified_files` structure remains suitable for later filtering.
4. **Refactor `AssetProcessor.process` Method:**
* Change the overall flow to handle multiple assets.
* **Steps:**
1. `_setup_workspace()`
2. `_extract_input()`
3. `_inventory_and_classify_files()` -> Get initial `self.classified_files` (all files).
4. Call the *new* `_determine_base_metadata()` using all relevant files -> Get `distinct_base_names` list and `file_to_base_name_map`.
5. Initialize an overall status dictionary (e.g., `{"processed": [], "skipped": [], "failed": []}`).
6. **Loop** through each `current_asset_name` in `distinct_base_names`:
* Log the start of processing for `current_asset_name`.
* **Filter Files:** Create temporary filtered lists of maps, models, etc., from `self.classified_files` based on the `file_to_base_name_map` for the `current_asset_name`.
* **Determine Metadata:** Call `_determine_single_asset_metadata(current_asset_name, filtered_files)` -> Get category/archetype for this asset. Store these along with `current_asset_name` and supplier name in a temporary `current_asset_metadata` dict.
* **Skip Check:** Perform the skip check logic specifically for `current_asset_name` using the `output_base_path`, supplier name, and `current_asset_name`. If skipped, update overall status and `continue` to the next asset name.
* **Process:** Call `_process_maps()`, `_merge_maps()`, passing the *filtered* file lists and potentially the `current_asset_metadata`. These methods need to operate only on the provided subset of files.
* **Generate Metadata:** Call `_generate_metadata_file()`, passing the `current_asset_metadata` and the results from map/merge processing for *this asset*. This method will now write `metadata.json` specific to `current_asset_name`.
* **Organize Output:** Call `_organize_output_files()`, passing the `current_asset_name`. This method needs modification:
* It will move the processed files for the *current asset* to the correct subfolder (`<output_base>/<supplier>/<current_asset_name>/`).
* It will also identify files from the *original* input whose base name was `None` in the `file_to_base_name_map` (the "unmatched" files).
* It will copy these "unmatched" files into the `Extra/` subfolder for the *current asset being processed in this loop iteration*.
* Update overall status based on the success/failure of this asset's processing.
7. `_cleanup_workspace()` (only after processing all assets from the input).
8. **Return:** Return the overall status dictionary summarizing results across all detected assets.
5. **Adapt `_process_maps`, `_merge_maps`, `_generate_metadata_file`, `_organize_output_files`:**
* Ensure these methods accept and use the filtered file lists and the specific `asset_name` for the current iteration.
* `_organize_output_files` needs the logic to handle copying the "unmatched" files into the current asset's `Extra/` folder.
**Phase 2: Update Orchestration (`main.py`, `gui/processing_handler.py`)**
1. **Modify `main.process_single_asset_wrapper`:**
* The call `processor.process()` will now return the overall status dictionary.
* The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI.
2. **Modify `gui.processing_handler.ProcessingHandler.run`:**
* No major changes needed here, as it relies on `process_single_asset_wrapper`. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.
**Phase 3: Update GUI Prediction (`asset_processor.py`, `gui/prediction_handler.py`, `gui/main_window.py`)**
1. **Modify `AssetProcessor.get_detailed_file_predictions`:**
* This method must now perform the multi-asset detection:
* Call the refactored `_determine_base_metadata` to get the `distinct_base_names` and `file_to_base_name_map`.
* Iterate through all classified files (maps, models, extra, ignored).
* For each file, look up its corresponding base name in the `file_to_base_name_map`.
* The returned dictionary for each file should now include:
* `original_path`: str
* `predicted_asset_name`: str | None (The base name determined for this file, or None if unmatched)
* `predicted_output_name`: str | None (The predicted final filename, e.g., `AssetName_Color_4K.png`, or original name for models/extra)
* `status`: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", **"Unmatched Extra"** - new status for files with `None` base name).
* `details`: str | None
2. **Update `gui.prediction_handler.PredictionHandler`:**
* Ensure it correctly passes the results from `get_detailed_file_predictions` (including the new `predicted_asset_name` and `status` values) back to the main window via signals.
3. **Update `gui.main_window.MainWindow`:**
* Modify the preview table model/delegate to display the `predicted_asset_name`. A new column might be needed.
* Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised".
**Visual Plan (`AssetProcessor.process` Sequence)**
```mermaid
sequenceDiagram
participant Client as Orchestrator (main.py / GUI Handler)
participant AP as AssetProcessor
participant Config as Configuration
participant FS as File System
Client->>AP: process(input_path, config, output_base, overwrite)
AP->>AP: _setup_workspace()
AP->>FS: Create temp_dir
AP->>AP: _extract_input()
AP->>FS: Extract/Copy files to temp_dir
AP->>AP: _inventory_and_classify_files()
AP-->>AP: self.classified_files (all files)
AP->>AP: _determine_base_metadata()
AP-->>AP: distinct_base_names, file_to_base_name_map
AP->>AP: Initialize overall_status = {}
loop For each current_asset_name in distinct_base_names
AP->>AP: Log start for current_asset_name
AP->>AP: Filter self.classified_files using file_to_base_name_map
AP-->>AP: filtered_files_for_asset
AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)
AP-->>AP: current_asset_metadata (category, archetype)
AP->>AP: Perform Skip Check for current_asset_name
alt Skip Check == True
AP->>AP: Update overall_status (skipped)
AP->>AP: continue loop
end
AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)
AP-->>AP: processed_map_details_asset
AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)
AP-->>AP: merged_map_details_asset
AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)
AP->>FS: Write metadata.json for current_asset_name
AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)
AP->>FS: Move processed files for current_asset_name
AP->>FS: Copy unmatched files to Extra/ for current_asset_name
AP->>AP: Update overall_status (processed/failed for this asset)
end
AP->>AP: _cleanup_workspace()
AP->>FS: Delete temp_dir
AP-->>Client: Return overall_status dictionary