--- ID: FEAT-004 Type: Feature Status: Complete Priority: Medium Labels: [core, gui, cli, feature, enhancement] Created: 2025-04-22 Updated: 2025-04-22 Related: #ISSUE-001 --- # [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index ## Description Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI. ## Current Behavior When processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name. ## Desired Behavior / Goals The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names. ## Implementation Notes (Optional) * Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names. * Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`. * For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name. * Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input. * Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display. * Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently). ## Acceptance Criteria (Optional) * [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`//AssetA/` and `//AssetB/`), each containing the correctly processed files and metadata for that asset. * [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name). * [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately. * [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra'). --- ## Implementation Plan (Generated by Architect Mode) **Goal:** Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the `source_naming.part_indices.base_name` rule, placing unmatched files into the `Extra/` folder of each processed asset. **Phase 1: Core Logic Refactoring (`asset_processor.py`)** 1. **Refactor `_determine_base_metadata`:** * **Input:** Takes the list of all file paths (relative to temp dir) found after extraction. * **Logic:** * Iterates through relevant file stems (maps, models). * Uses the `source_naming_separator` and `source_naming_indices['base_name']` to extract potential base names for each file stem. * Identifies the set of *distinct* base names found across all files. * Creates a mapping: `Dict[Path, Optional[str]]` where keys are relative file paths and values are the determined base name string (or `None` if a file doesn't match any base name according to the index rule). * **Output:** Returns a tuple: `(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]])`. * **Remove:** Logic setting `self.metadata["asset_name"]`, `asset_category`, and `archetype`. 2. **Create New Method `_determine_single_asset_metadata`:** * **Input:** Takes a specific `asset_base_name` (string) and the list of `classified_files` *filtered* for that asset. * **Logic:** Contains the logic previously in `_determine_base_metadata` for determining `asset_category` and `archetype` based *only* on the files associated with the given `asset_base_name`. * **Output:** Returns a dictionary containing `{"asset_category": str, "archetype": str}` for the specific asset. 3. **Modify `_inventory_and_classify_files`:** * No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the `classified_files` structure remains suitable for later filtering. 4. **Refactor `AssetProcessor.process` Method:** * Change the overall flow to handle multiple assets. * **Steps:** 1. `_setup_workspace()` 2. `_extract_input()` 3. `_inventory_and_classify_files()` -> Get initial `self.classified_files` (all files). 4. Call the *new* `_determine_base_metadata()` using all relevant files -> Get `distinct_base_names` list and `file_to_base_name_map`. 5. Initialize an overall status dictionary (e.g., `{"processed": [], "skipped": [], "failed": []}`). 6. **Loop** through each `current_asset_name` in `distinct_base_names`: * Log the start of processing for `current_asset_name`. * **Filter Files:** Create temporary filtered lists of maps, models, etc., from `self.classified_files` based on the `file_to_base_name_map` for the `current_asset_name`. * **Determine Metadata:** Call `_determine_single_asset_metadata(current_asset_name, filtered_files)` -> Get category/archetype for this asset. Store these along with `current_asset_name` and supplier name in a temporary `current_asset_metadata` dict. * **Skip Check:** Perform the skip check logic specifically for `current_asset_name` using the `output_base_path`, supplier name, and `current_asset_name`. If skipped, update overall status and `continue` to the next asset name. * **Process:** Call `_process_maps()`, `_merge_maps()`, passing the *filtered* file lists and potentially the `current_asset_metadata`. These methods need to operate only on the provided subset of files. * **Generate Metadata:** Call `_generate_metadata_file()`, passing the `current_asset_metadata` and the results from map/merge processing for *this asset*. This method will now write `metadata.json` specific to `current_asset_name`. * **Organize Output:** Call `_organize_output_files()`, passing the `current_asset_name`. This method needs modification: * It will move the processed files for the *current asset* to the correct subfolder (`///`). * It will also identify files from the *original* input whose base name was `None` in the `file_to_base_name_map` (the "unmatched" files). * It will copy these "unmatched" files into the `Extra/` subfolder for the *current asset being processed in this loop iteration*. * Update overall status based on the success/failure of this asset's processing. 7. `_cleanup_workspace()` (only after processing all assets from the input). 8. **Return:** Return the overall status dictionary summarizing results across all detected assets. 5. **Adapt `_process_maps`, `_merge_maps`, `_generate_metadata_file`, `_organize_output_files`:** * Ensure these methods accept and use the filtered file lists and the specific `asset_name` for the current iteration. * `_organize_output_files` needs the logic to handle copying the "unmatched" files into the current asset's `Extra/` folder. **Phase 2: Update Orchestration (`main.py`, `gui/processing_handler.py`)** 1. **Modify `main.process_single_asset_wrapper`:** * The call `processor.process()` will now return the overall status dictionary. * The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI. 2. **Modify `gui.processing_handler.ProcessingHandler.run`:** * No major changes needed here, as it relies on `process_single_asset_wrapper`. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice. **Phase 3: Update GUI Prediction (`asset_processor.py`, `gui/prediction_handler.py`, `gui/main_window.py`)** 1. **Modify `AssetProcessor.get_detailed_file_predictions`:** * This method must now perform the multi-asset detection: * Call the refactored `_determine_base_metadata` to get the `distinct_base_names` and `file_to_base_name_map`. * Iterate through all classified files (maps, models, extra, ignored). * For each file, look up its corresponding base name in the `file_to_base_name_map`. * The returned dictionary for each file should now include: * `original_path`: str * `predicted_asset_name`: str | None (The base name determined for this file, or None if unmatched) * `predicted_output_name`: str | None (The predicted final filename, e.g., `AssetName_Color_4K.png`, or original name for models/extra) * `status`: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", **"Unmatched Extra"** - new status for files with `None` base name). * `details`: str | None 2. **Update `gui.prediction_handler.PredictionHandler`:** * Ensure it correctly passes the results from `get_detailed_file_predictions` (including the new `predicted_asset_name` and `status` values) back to the main window via signals. 3. **Update `gui.main_window.MainWindow`:** * Modify the preview table model/delegate to display the `predicted_asset_name`. A new column might be needed. * Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised". **Visual Plan (`AssetProcessor.process` Sequence)** ```mermaid sequenceDiagram participant Client as Orchestrator (main.py / GUI Handler) participant AP as AssetProcessor participant Config as Configuration participant FS as File System Client->>AP: process(input_path, config, output_base, overwrite) AP->>AP: _setup_workspace() AP->>FS: Create temp_dir AP->>AP: _extract_input() AP->>FS: Extract/Copy files to temp_dir AP->>AP: _inventory_and_classify_files() AP-->>AP: self.classified_files (all files) AP->>AP: _determine_base_metadata() AP-->>AP: distinct_base_names, file_to_base_name_map AP->>AP: Initialize overall_status = {} loop For each current_asset_name in distinct_base_names AP->>AP: Log start for current_asset_name AP->>AP: Filter self.classified_files using file_to_base_name_map AP-->>AP: filtered_files_for_asset AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset) AP-->>AP: current_asset_metadata (category, archetype) AP->>AP: Perform Skip Check for current_asset_name alt Skip Check == True AP->>AP: Update overall_status (skipped) AP->>AP: continue loop end AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata) AP-->>AP: processed_map_details_asset AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata) AP-->>AP: merged_map_details_asset AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset) AP->>FS: Write metadata.json for current_asset_name AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map) AP->>FS: Move processed files for current_asset_name AP->>FS: Copy unmatched files to Extra/ for current_asset_name AP->>AP: Update overall_status (processed/failed for this asset) end AP->>AP: _cleanup_workspace() AP->>FS: Delete temp_dir AP-->>Client: Return overall_status dictionary