[FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index

Description

Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the source_naming.part_indices.base_name rule in the preset), the tool's fallback logic uses os.path.commonprefix to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.

Current Behavior

When processing an input containing files from multiple assets (e.g., 3-HeartOak... and 3-Oak-Classic... in the same ZIP), the _determine_base_metadata method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name.

Desired Behavior / Goals

The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the source_naming.part_indices.base_name rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete metadata.json file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.

Implementation Notes (Optional)

Modify AssetProcessor._determine_base_metadata to return a list of distinct base names and a mapping of files to their determined base names.
Adjust the main processing orchestration (main.py, gui/processing_handler.py) to iterate over the list of distinct base names returned by _determine_base_metadata.
For each distinct base name, create a new processing context (potentially a new AssetProcessor instance or a modified approach) that operates only on the files associated with that specific base name.
Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.
Update AssetProcessor.get_detailed_file_predictions to correctly identify and group files by distinct base names for accurate GUI preview display.
Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).

Acceptance Criteria (Optional)

Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with base_name_index results in two separate output directories (<output_base>/<supplier>/AssetA/ and <output_base>/<supplier>/AssetB/), each containing the correctly processed files and metadata for that asset.
The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).
The CLI processing of a multi-asset input correctly processes and outputs each asset separately.
The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').

Implementation Plan (Generated by Architect Mode)

Goal: Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the source_naming.part_indices.base_name rule, placing unmatched files into the Extra/ folder of each processed asset.

Phase 1: Core Logic Refactoring (asset_processor.py)

Refactor _determine_base_metadata:
- Input: Takes the list of all file paths (relative to temp dir) found after extraction.
- Logic:
  - Iterates through relevant file stems (maps, models).
  - Uses the source_naming_separator and source_naming_indices['base_name'] to extract potential base names for each file stem.
  - Identifies the set of distinct base names found across all files.
  - Creates a mapping: Dict[Path, Optional[str]] where keys are relative file paths and values are the determined base name string (or None if a file doesn't match any base name according to the index rule).
- Output: Returns a tuple: (distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]]).
- Remove: Logic setting self.metadata["asset_name"], asset_category, and archetype.
Create New Method _determine_single_asset_metadata:
- Input: Takes a specific asset_base_name (string) and the list of classified_files filtered for that asset.
- Logic: Contains the logic previously in _determine_base_metadata for determining asset_category and archetype based only on the files associated with the given asset_base_name.
- Output: Returns a dictionary containing {"asset_category": str, "archetype": str} for the specific asset.
Modify _inventory_and_classify_files:
- No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the classified_files structure remains suitable for later filtering.
Refactor AssetProcessor.process Method:
- Change the overall flow to handle multiple assets.
- Steps:
  1. _setup_workspace()
  2. _extract_input()
  3. _inventory_and_classify_files() -> Get initial self.classified_files (all files).
  4. Call the new _determine_base_metadata() using all relevant files -> Get distinct_base_names list and file_to_base_name_map.
  5. Initialize an overall status dictionary (e.g., {"processed": [], "skipped": [], "failed": []}).
  6. Loop through each current_asset_name in distinct_base_names:
    - Log the start of processing for current_asset_name.
    - Filter Files: Create temporary filtered lists of maps, models, etc., from self.classified_files based on the file_to_base_name_map for the current_asset_name.
    - Determine Metadata: Call _determine_single_asset_metadata(current_asset_name, filtered_files) -> Get category/archetype for this asset. Store these along with current_asset_name and supplier name in a temporary current_asset_metadata dict.
    - Skip Check: Perform the skip check logic specifically for current_asset_name using the output_base_path, supplier name, and current_asset_name. If skipped, update overall status and continue to the next asset name.
    - Process: Call _process_maps(), _merge_maps(), passing the filtered file lists and potentially the current_asset_metadata. These methods need to operate only on the provided subset of files.
    - Generate Metadata: Call _generate_metadata_file(), passing the current_asset_metadata and the results from map/merge processing for this asset. This method will now write metadata.json specific to current_asset_name.
    - Organize Output: Call _organize_output_files(), passing the current_asset_name. This method needs modification:
      - It will move the processed files for the current asset to the correct subfolder (<output_base>/<supplier>/<current_asset_name>/).
      - It will also identify files from the original input whose base name was None in the file_to_base_name_map (the "unmatched" files).
      - It will copy these "unmatched" files into the Extra/ subfolder for the current asset being processed in this loop iteration.
    - Update overall status based on the success/failure of this asset's processing.
  7. _cleanup_workspace() (only after processing all assets from the input).
  8. Return: Return the overall status dictionary summarizing results across all detected assets.
Adapt _process_maps, _merge_maps, _generate_metadata_file, _organize_output_files:
- Ensure these methods accept and use the filtered file lists and the specific asset_name for the current iteration.
- _organize_output_files needs the logic to handle copying the "unmatched" files into the current asset's Extra/ folder.

Phase 2: Update Orchestration (main.py, gui/processing_handler.py)

Modify main.process_single_asset_wrapper:
- The call processor.process() will now return the overall status dictionary.
- The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI.
Modify gui.processing_handler.ProcessingHandler.run:
- No major changes needed here, as it relies on process_single_asset_wrapper. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.

Phase 3: Update GUI Prediction (asset_processor.py, gui/prediction_handler.py, gui/main_window.py)

Modify AssetProcessor.get_detailed_file_predictions:
- This method must now perform the multi-asset detection:
  - Call the refactored _determine_base_metadata to get the distinct_base_names and file_to_base_name_map.
- Iterate through all classified files (maps, models, extra, ignored).
- For each file, look up its corresponding base name in the file_to_base_name_map.
- The returned dictionary for each file should now include:
  - original_path: str
  - predicted_asset_name: str | None (The base name determined for this file, or None if unmatched)
  - predicted_output_name: str | None (The predicted final filename, e.g., AssetName_Color_4K.png, or original name for models/extra)
  - status: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", "Unmatched Extra" - new status for files with None base name).
  - details: str | None
Update gui.prediction_handler.PredictionHandler:
- Ensure it correctly passes the results from get_detailed_file_predictions (including the new predicted_asset_name and status values) back to the main window via signals.
Update gui.main_window.MainWindow:
- Modify the preview table model/delegate to display the predicted_asset_name. A new column might be needed.
- Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised".

Visual Plan (AssetProcessor.process Sequence)

sequenceDiagram
    participant Client as Orchestrator (main.py / GUI Handler)
    participant AP as AssetProcessor
    participant Config as Configuration
    participant FS as File System

    Client->>AP: process(input_path, config, output_base, overwrite)
    AP->>AP: _setup_workspace()
    AP->>FS: Create temp_dir
    AP->>AP: _extract_input()
    AP->>FS: Extract/Copy files to temp_dir
    AP->>AP: _inventory_and_classify_files()
    AP-->>AP: self.classified_files (all files)
    AP->>AP: _determine_base_metadata()
    AP-->>AP: distinct_base_names, file_to_base_name_map

    AP->>AP: Initialize overall_status = {}
    loop For each current_asset_name in distinct_base_names
        AP->>AP: Log start for current_asset_name
        AP->>AP: Filter self.classified_files using file_to_base_name_map
        AP-->>AP: filtered_files_for_asset
        AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)
        AP-->>AP: current_asset_metadata (category, archetype)
        AP->>AP: Perform Skip Check for current_asset_name
        alt Skip Check == True
            AP->>AP: Update overall_status (skipped)
            AP->>AP: continue loop
        end
        AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)
        AP-->>AP: processed_map_details_asset
        AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)
        AP-->>AP: merged_map_details_asset
        AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)
        AP->>FS: Write metadata.json for current_asset_name
        AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)
        AP->>FS: Move processed files for current_asset_name
        AP->>FS: Copy unmatched files to Extra/ for current_asset_name
        AP->>AP: Update overall_status (processed/failed for this asset)
    end
    AP->>AP: _cleanup_workspace()
    AP->>FS: Delete temp_dir
    AP-->>Client: Return overall_status dictionary

13 KiB Raw Blame History