Asset-Frameworker/Tickets/Resolved/FEAT-004-handle-multi-asset-inputs.md

---
ID: FEAT-004
Type: Feature
Status: Complete
Priority: Medium
Labels: [core, gui, cli, feature, enhancement]
Created: 2025-04-22
Updated: 2025-04-22
Related: #ISSUE-001
---

# [FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index

## Description
Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the `source_naming.part_indices.base_name` rule in the preset), the tool's fallback logic uses `os.path.commonprefix` to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.

## Current Behavior
When processing an input containing files from multiple assets (e.g., `3-HeartOak...` and `3-Oak-Classic...` in the same ZIP), the `_determine_base_metadata` method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name.

## Desired Behavior / Goals
The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the `source_naming.part_indices.base_name` rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete `metadata.json` file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.

## Implementation Notes (Optional)
*   Modify `AssetProcessor._determine_base_metadata` to return a list of distinct base names and a mapping of files to their determined base names.
*   Adjust the main processing orchestration (`main.py`, `gui/processing_handler.py`) to iterate over the list of distinct base names returned by `_determine_base_metadata`.
*   For each distinct base name, create a new processing context (potentially a new `AssetProcessor` instance or a modified approach) that operates only on the files associated with that specific base name.
*   Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.
*   Update `AssetProcessor.get_detailed_file_predictions` to correctly identify and group files by distinct base names for accurate GUI preview display.
*   Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).

## Acceptance Criteria (Optional)
*   [ ] Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with `base_name_index` results in two separate output directories (`<output_base>/<supplier>/AssetA/` and `<output_base>/<supplier>/AssetB/`), each containing the correctly processed files and metadata for that asset.
*   [ ] The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).
*   [ ] The CLI processing of a multi-asset input correctly processes and outputs each asset separately.
*   [ ] The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').
---
## Implementation Plan (Generated by Architect Mode)

**Goal:** Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the `source_naming.part_indices.base_name` rule, placing unmatched files into the `Extra/` folder of each processed asset.

**Phase 1: Core Logic Refactoring (`asset_processor.py`)**

1.  **Refactor `_determine_base_metadata`:**
    *   **Input:** Takes the list of all file paths (relative to temp dir) found after extraction.
    *   **Logic:**
        *   Iterates through relevant file stems (maps, models).
        *   Uses the `source_naming_separator` and `source_naming_indices['base_name']` to extract potential base names for each file stem.
        *   Identifies the set of *distinct* base names found across all files.
        *   Creates a mapping: `Dict[Path, Optional[str]]` where keys are relative file paths and values are the determined base name string (or `None` if a file doesn't match any base name according to the index rule).
    *   **Output:** Returns a tuple: `(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]])`.
    *   **Remove:** Logic setting `self.metadata["asset_name"]`, `asset_category`, and `archetype`.

2.  **Create New Method `_determine_single_asset_metadata`:**
    *   **Input:** Takes a specific `asset_base_name` (string) and the list of `classified_files` *filtered* for that asset.
    *   **Logic:** Contains the logic previously in `_determine_base_metadata` for determining `asset_category` and `archetype` based *only* on the files associated with the given `asset_base_name`.
    *   **Output:** Returns a dictionary containing `{"asset_category": str, "archetype": str}` for the specific asset.

3.  **Modify `_inventory_and_classify_files`:**
    *   No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the `classified_files` structure remains suitable for later filtering.

4.  **Refactor `AssetProcessor.process` Method:**
    *   Change the overall flow to handle multiple assets.
    *   **Steps:**
        1.  `_setup_workspace()`
        2.  `_extract_input()`
        3.  `_inventory_and_classify_files()` -> Get initial `self.classified_files` (all files).
        4.  Call the *new* `_determine_base_metadata()` using all relevant files -> Get `distinct_base_names` list and `file_to_base_name_map`.
        5.  Initialize an overall status dictionary (e.g., `{"processed": [], "skipped": [], "failed": []}`).
        6.  **Loop** through each `current_asset_name` in `distinct_base_names`:
            *   Log the start of processing for `current_asset_name`.
            *   **Filter Files:** Create temporary filtered lists of maps, models, etc., from `self.classified_files` based on the `file_to_base_name_map` for the `current_asset_name`.
            *   **Determine Metadata:** Call `_determine_single_asset_metadata(current_asset_name, filtered_files)` -> Get category/archetype for this asset. Store these along with `current_asset_name` and supplier name in a temporary `current_asset_metadata` dict.
            *   **Skip Check:** Perform the skip check logic specifically for `current_asset_name` using the `output_base_path`, supplier name, and `current_asset_name`. If skipped, update overall status and `continue` to the next asset name.
            *   **Process:** Call `_process_maps()`, `_merge_maps()`, passing the *filtered* file lists and potentially the `current_asset_metadata`. These methods need to operate only on the provided subset of files.
            *   **Generate Metadata:** Call `_generate_metadata_file()`, passing the `current_asset_metadata` and the results from map/merge processing for *this asset*. This method will now write `metadata.json` specific to `current_asset_name`.
            *   **Organize Output:** Call `_organize_output_files()`, passing the `current_asset_name`. This method needs modification:
                *   It will move the processed files for the *current asset* to the correct subfolder (`<output_base>/<supplier>/<current_asset_name>/`).
                *   It will also identify files from the *original* input whose base name was `None` in the `file_to_base_name_map` (the "unmatched" files).
                *   It will copy these "unmatched" files into the `Extra/` subfolder for the *current asset being processed in this loop iteration*.
            *   Update overall status based on the success/failure of this asset's processing.
        7.  `_cleanup_workspace()` (only after processing all assets from the input).
        8.  **Return:** Return the overall status dictionary summarizing results across all detected assets.

5.  **Adapt `_process_maps`, `_merge_maps`, `_generate_metadata_file`, `_organize_output_files`:**
    *   Ensure these methods accept and use the filtered file lists and the specific `asset_name` for the current iteration.
    *   `_organize_output_files` needs the logic to handle copying the "unmatched" files into the current asset's `Extra/` folder.

**Phase 2: Update Orchestration (`main.py`, `gui/processing_handler.py`)**

1.  **Modify `main.process_single_asset_wrapper`:**
    *   The call `processor.process()` will now return the overall status dictionary.
    *   The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI.

2.  **Modify `gui.processing_handler.ProcessingHandler.run`:**
    *   No major changes needed here, as it relies on `process_single_asset_wrapper`. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.

**Phase 3: Update GUI Prediction (`asset_processor.py`, `gui/prediction_handler.py`, `gui/main_window.py`)**

1.  **Modify `AssetProcessor.get_detailed_file_predictions`:**
    *   This method must now perform the multi-asset detection:
        *   Call the refactored `_determine_base_metadata` to get the `distinct_base_names` and `file_to_base_name_map`.
    *   Iterate through all classified files (maps, models, extra, ignored).
    *   For each file, look up its corresponding base name in the `file_to_base_name_map`.
    *   The returned dictionary for each file should now include:
        *   `original_path`: str
        *   `predicted_asset_name`: str | None (The base name determined for this file, or None if unmatched)
        *   `predicted_output_name`: str | None (The predicted final filename, e.g., `AssetName_Color_4K.png`, or original name for models/extra)
        *   `status`: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", **"Unmatched Extra"** - new status for files with `None` base name).
        *   `details`: str | None

2.  **Update `gui.prediction_handler.PredictionHandler`:**
    *   Ensure it correctly passes the results from `get_detailed_file_predictions` (including the new `predicted_asset_name` and `status` values) back to the main window via signals.

3.  **Update `gui.main_window.MainWindow`:**
    *   Modify the preview table model/delegate to display the `predicted_asset_name`. A new column might be needed.
    *   Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised".

**Visual Plan (`AssetProcessor.process` Sequence)**

```mermaid
sequenceDiagram
    participant Client as Orchestrator (main.py / GUI Handler)
    participant AP as AssetProcessor
    participant Config as Configuration
    participant FS as File System

    Client->>AP: process(input_path, config, output_base, overwrite)
    AP->>AP: _setup_workspace()
    AP->>FS: Create temp_dir
    AP->>AP: _extract_input()
    AP->>FS: Extract/Copy files to temp_dir
    AP->>AP: _inventory_and_classify_files()
    AP-->>AP: self.classified_files (all files)
    AP->>AP: _determine_base_metadata()
    AP-->>AP: distinct_base_names, file_to_base_name_map

    AP->>AP: Initialize overall_status = {}
    loop For each current_asset_name in distinct_base_names
        AP->>AP: Log start for current_asset_name
        AP->>AP: Filter self.classified_files using file_to_base_name_map
        AP-->>AP: filtered_files_for_asset
        AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)
        AP-->>AP: current_asset_metadata (category, archetype)
        AP->>AP: Perform Skip Check for current_asset_name
        alt Skip Check == True
            AP->>AP: Update overall_status (skipped)
            AP->>AP: continue loop
        end
        AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)
        AP-->>AP: processed_map_details_asset
        AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)
        AP-->>AP: merged_map_details_asset
        AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)
        AP->>FS: Write metadata.json for current_asset_name
        AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)
        AP->>FS: Move processed files for current_asset_name
        AP->>FS: Copy unmatched files to Extra/ for current_asset_name
        AP->>AP: Update overall_status (processed/failed for this asset)
    end
    AP->>AP: _cleanup_workspace()
    AP->>FS: Delete temp_dir
    AP-->>Client: Return overall_status dictionary