162 lines
13 KiB
Markdown
162 lines
13 KiB
Markdown
---
|
|
ID: FEAT-004
|
|
Type: Feature
|
|
Status: Complete
|
|
Priority: Medium
|
|
Labels: [core, gui, cli, feature, enhancement]
|
|
Created: 2025-04-22
|
|
Updated: 2025-04-22
|
|
Related: #ISSUE-001
|
|
---
|
|
|
|
# [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index
|
|
|
|
## Description
|
|
Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.
|
|
|
|
## Current Behavior
|
|
When processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name.
|
|
|
|
## Desired Behavior / Goals
|
|
The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.
|
|
|
|
## Implementation Notes (Optional)
|
|
* Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names.
|
|
* Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`.
|
|
* For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name.
|
|
* Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.
|
|
* Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display.
|
|
* Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).
|
|
|
|
## Acceptance Criteria (Optional)
|
|
* [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`<output_base>/<supplier>/AssetA/` and `<output_base>/<supplier>/AssetB/`), each containing the correctly processed files and metadata for that asset.
|
|
* [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).
|
|
* [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately.
|
|
* [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').
|
|
---
|
|
## Implementation Plan (Generated by Architect Mode)
|
|
|
|
**Goal:** Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the `source_naming.part_indices.base_name` rule, placing unmatched files into the `Extra/` folder of each processed asset.
|
|
|
|
**Phase 1: Core Logic Refactoring (`asset_processor.py`)**
|
|
|
|
1. **Refactor `_determine_base_metadata`:**
|
|
* **Input:** Takes the list of all file paths (relative to temp dir) found after extraction.
|
|
* **Logic:**
|
|
* Iterates through relevant file stems (maps, models).
|
|
* Uses the `source_naming_separator` and `source_naming_indices['base_name']` to extract potential base names for each file stem.
|
|
* Identifies the set of *distinct* base names found across all files.
|
|
* Creates a mapping: `Dict[Path, Optional[str]]` where keys are relative file paths and values are the determined base name string (or `None` if a file doesn't match any base name according to the index rule).
|
|
* **Output:** Returns a tuple: `(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]])`.
|
|
* **Remove:** Logic setting `self.metadata["asset_name"]`, `asset_category`, and `archetype`.
|
|
|
|
2. **Create New Method `_determine_single_asset_metadata`:**
|
|
* **Input:** Takes a specific `asset_base_name` (string) and the list of `classified_files` *filtered* for that asset.
|
|
* **Logic:** Contains the logic previously in `_determine_base_metadata` for determining `asset_category` and `archetype` based *only* on the files associated with the given `asset_base_name`.
|
|
* **Output:** Returns a dictionary containing `{"asset_category": str, "archetype": str}` for the specific asset.
|
|
|
|
3. **Modify `_inventory_and_classify_files`:**
|
|
* No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the `classified_files` structure remains suitable for later filtering.
|
|
|
|
4. **Refactor `AssetProcessor.process` Method:**
|
|
* Change the overall flow to handle multiple assets.
|
|
* **Steps:**
|
|
1. `_setup_workspace()`
|
|
2. `_extract_input()`
|
|
3. `_inventory_and_classify_files()` -> Get initial `self.classified_files` (all files).
|
|
4. Call the *new* `_determine_base_metadata()` using all relevant files -> Get `distinct_base_names` list and `file_to_base_name_map`.
|
|
5. Initialize an overall status dictionary (e.g., `{"processed": [], "skipped": [], "failed": []}`).
|
|
6. **Loop** through each `current_asset_name` in `distinct_base_names`:
|
|
* Log the start of processing for `current_asset_name`.
|
|
* **Filter Files:** Create temporary filtered lists of maps, models, etc., from `self.classified_files` based on the `file_to_base_name_map` for the `current_asset_name`.
|
|
* **Determine Metadata:** Call `_determine_single_asset_metadata(current_asset_name, filtered_files)` -> Get category/archetype for this asset. Store these along with `current_asset_name` and supplier name in a temporary `current_asset_metadata` dict.
|
|
* **Skip Check:** Perform the skip check logic specifically for `current_asset_name` using the `output_base_path`, supplier name, and `current_asset_name`. If skipped, update overall status and `continue` to the next asset name.
|
|
* **Process:** Call `_process_maps()`, `_merge_maps()`, passing the *filtered* file lists and potentially the `current_asset_metadata`. These methods need to operate only on the provided subset of files.
|
|
* **Generate Metadata:** Call `_generate_metadata_file()`, passing the `current_asset_metadata` and the results from map/merge processing for *this asset*. This method will now write `metadata.json` specific to `current_asset_name`.
|
|
* **Organize Output:** Call `_organize_output_files()`, passing the `current_asset_name`. This method needs modification:
|
|
* It will move the processed files for the *current asset* to the correct subfolder (`<output_base>/<supplier>/<current_asset_name>/`).
|
|
* It will also identify files from the *original* input whose base name was `None` in the `file_to_base_name_map` (the "unmatched" files).
|
|
* It will copy these "unmatched" files into the `Extra/` subfolder for the *current asset being processed in this loop iteration*.
|
|
* Update overall status based on the success/failure of this asset's processing.
|
|
7. `_cleanup_workspace()` (only after processing all assets from the input).
|
|
8. **Return:** Return the overall status dictionary summarizing results across all detected assets.
|
|
|
|
5. **Adapt `_process_maps`, `_merge_maps`, `_generate_metadata_file`, `_organize_output_files`:**
|
|
* Ensure these methods accept and use the filtered file lists and the specific `asset_name` for the current iteration.
|
|
* `_organize_output_files` needs the logic to handle copying the "unmatched" files into the current asset's `Extra/` folder.
|
|
|
|
**Phase 2: Update Orchestration (`main.py`, `gui/processing_handler.py`)**
|
|
|
|
1. **Modify `main.process_single_asset_wrapper`:**
|
|
* The call `processor.process()` will now return the overall status dictionary.
|
|
* The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI.
|
|
|
|
2. **Modify `gui.processing_handler.ProcessingHandler.run`:**
|
|
* No major changes needed here, as it relies on `process_single_asset_wrapper`. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.
|
|
|
|
**Phase 3: Update GUI Prediction (`asset_processor.py`, `gui/prediction_handler.py`, `gui/main_window.py`)**
|
|
|
|
1. **Modify `AssetProcessor.get_detailed_file_predictions`:**
|
|
* This method must now perform the multi-asset detection:
|
|
* Call the refactored `_determine_base_metadata` to get the `distinct_base_names` and `file_to_base_name_map`.
|
|
* Iterate through all classified files (maps, models, extra, ignored).
|
|
* For each file, look up its corresponding base name in the `file_to_base_name_map`.
|
|
* The returned dictionary for each file should now include:
|
|
* `original_path`: str
|
|
* `predicted_asset_name`: str | None (The base name determined for this file, or None if unmatched)
|
|
* `predicted_output_name`: str | None (The predicted final filename, e.g., `AssetName_Color_4K.png`, or original name for models/extra)
|
|
* `status`: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", **"Unmatched Extra"** - new status for files with `None` base name).
|
|
* `details`: str | None
|
|
|
|
2. **Update `gui.prediction_handler.PredictionHandler`:**
|
|
* Ensure it correctly passes the results from `get_detailed_file_predictions` (including the new `predicted_asset_name` and `status` values) back to the main window via signals.
|
|
|
|
3. **Update `gui.main_window.MainWindow`:**
|
|
* Modify the preview table model/delegate to display the `predicted_asset_name`. A new column might be needed.
|
|
* Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised".
|
|
|
|
**Visual Plan (`AssetProcessor.process` Sequence)**
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client as Orchestrator (main.py / GUI Handler)
|
|
participant AP as AssetProcessor
|
|
participant Config as Configuration
|
|
participant FS as File System
|
|
|
|
Client->>AP: process(input_path, config, output_base, overwrite)
|
|
AP->>AP: _setup_workspace()
|
|
AP->>FS: Create temp_dir
|
|
AP->>AP: _extract_input()
|
|
AP->>FS: Extract/Copy files to temp_dir
|
|
AP->>AP: _inventory_and_classify_files()
|
|
AP-->>AP: self.classified_files (all files)
|
|
AP->>AP: _determine_base_metadata()
|
|
AP-->>AP: distinct_base_names, file_to_base_name_map
|
|
|
|
AP->>AP: Initialize overall_status = {}
|
|
loop For each current_asset_name in distinct_base_names
|
|
AP->>AP: Log start for current_asset_name
|
|
AP->>AP: Filter self.classified_files using file_to_base_name_map
|
|
AP-->>AP: filtered_files_for_asset
|
|
AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)
|
|
AP-->>AP: current_asset_metadata (category, archetype)
|
|
AP->>AP: Perform Skip Check for current_asset_name
|
|
alt Skip Check == True
|
|
AP->>AP: Update overall_status (skipped)
|
|
AP->>AP: continue loop
|
|
end
|
|
AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)
|
|
AP-->>AP: processed_map_details_asset
|
|
AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)
|
|
AP-->>AP: merged_map_details_asset
|
|
AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)
|
|
AP->>FS: Write metadata.json for current_asset_name
|
|
AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)
|
|
AP->>FS: Move processed files for current_asset_name
|
|
AP->>FS: Copy unmatched files to Extra/ for current_asset_name
|
|
AP->>AP: Update overall_status (processed/failed for this asset)
|
|
end
|
|
AP->>AP: _cleanup_workspace()
|
|
AP->>FS: Delete temp_dir
|
|
AP-->>Client: Return overall_status dictionary |