13 KiB
ID, Type, Status, Priority, Labels, Created, Updated, Related
| ID | Type | Status | Priority | Labels | Created | Updated | Related | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FEAT-004 | Feature | Complete | Medium |
|
2025-04-22 | 2025-04-22 |
[FEAT-004]: Handle Multi-Asset Inputs Based on Source Naming Index
Description
Currently, when an input ZIP or folder contains files from multiple distinct assets (as identified by the source_naming.part_indices.base_name rule in the preset), the tool's fallback logic uses os.path.commonprefix to determine a single, often incorrect, asset name. This prevents the tool from correctly processing inputs containing multiple assets and leads to incorrect predictions in the GUI.
Current Behavior
When processing an input containing files from multiple assets (e.g., 3-HeartOak... and 3-Oak-Classic... in the same ZIP), the _determine_base_metadata method identifies multiple potential base names based on the configured index. It then falls back to calculating the common prefix of all relevant file stems, resulting in a truncated or incorrect asset name (e.g., "3-"). The processing pipeline and GUI prediction then proceed using this incorrect name.
Desired Behavior / Goals
The tool should accurately detect when a single input (ZIP/folder) contains files belonging to multiple distinct assets, as defined by the source_naming.part_indices.base_name rule. For each distinct base name identified, the tool should process the corresponding subset of files as a separate, independent asset. This includes generating a correct output directory structure and a complete metadata.json file for each detected asset within the input. The GUI preview should also accurately reflect the presence of multiple assets and their predicted names.
Implementation Notes (Optional)
- Modify
AssetProcessor._determine_base_metadatato return a list of distinct base names and a mapping of files to their determined base names. - Adjust the main processing orchestration (
main.py,gui/processing_handler.py) to iterate over the list of distinct base names returned by_determine_base_metadata. - For each distinct base name, create a new processing context (potentially a new
AssetProcessorinstance or a modified approach) that operates only on the files associated with that specific base name. - Ensure temporary workspace handling and cleanup correctly manage files for multiple assets from a single input.
- Update
AssetProcessor.get_detailed_file_predictionsto correctly identify and group files by distinct base names for accurate GUI preview display. - Consider edge cases: what if some files don't match any determined base name? (They should likely still go to 'Extra/'). What if the index method yields no names? (Fallback to input name as currently).
Acceptance Criteria (Optional)
- Processing a ZIP file containing files for two distinct assets (e.g., 'AssetA' and 'AssetB') using a preset with
base_name_indexresults in two separate output directories (<output_base>/<supplier>/AssetA/and<output_base>/<supplier>/AssetB/), each containing the correctly processed files and metadata for that asset. - The GUI preview accurately lists the files from the multi-asset input and shows the correct predicted asset name for each file based on its determined base name (e.g., files belonging to 'AssetA' show 'AssetA' as the predicted name).
- The CLI processing of a multi-asset input correctly processes and outputs each asset separately.
- The tool handles cases where some files in a multi-asset input do not match any determined base name (e.g., they are correctly classified as 'Unrecognised' or 'Extra').
Implementation Plan (Generated by Architect Mode)
Goal: Modify the tool to correctly identify and process multiple distinct assets within a single input (ZIP/folder) based on the source_naming.part_indices.base_name rule, placing unmatched files into the Extra/ folder of each processed asset.
Phase 1: Core Logic Refactoring (asset_processor.py)
-
Refactor
_determine_base_metadata:- Input: Takes the list of all file paths (relative to temp dir) found after extraction.
- Logic:
- Iterates through relevant file stems (maps, models).
- Uses the
source_naming_separatorandsource_naming_indices['base_name']to extract potential base names for each file stem. - Identifies the set of distinct base names found across all files.
- Creates a mapping:
Dict[Path, Optional[str]]where keys are relative file paths and values are the determined base name string (orNoneif a file doesn't match any base name according to the index rule).
- Output: Returns a tuple:
(distinct_base_names: List[str], file_to_base_name_map: Dict[Path, Optional[str]]). - Remove: Logic setting
self.metadata["asset_name"],asset_category, andarchetype.
-
Create New Method
_determine_single_asset_metadata:- Input: Takes a specific
asset_base_name(string) and the list ofclassified_filesfiltered for that asset. - Logic: Contains the logic previously in
_determine_base_metadatafor determiningasset_categoryandarchetypebased only on the files associated with the givenasset_base_name. - Output: Returns a dictionary containing
{"asset_category": str, "archetype": str}for the specific asset.
- Input: Takes a specific
-
Modify
_inventory_and_classify_files:- No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the
classified_filesstructure remains suitable for later filtering.
- No major changes needed here initially, as it classifies based on file patterns independent of the final asset name. However, ensure the
-
Refactor
AssetProcessor.processMethod:- Change the overall flow to handle multiple assets.
- Steps:
_setup_workspace()_extract_input()_inventory_and_classify_files()-> Get initialself.classified_files(all files).- Call the new
_determine_base_metadata()using all relevant files -> Getdistinct_base_nameslist andfile_to_base_name_map. - Initialize an overall status dictionary (e.g.,
{"processed": [], "skipped": [], "failed": []}). - Loop through each
current_asset_nameindistinct_base_names:- Log the start of processing for
current_asset_name. - Filter Files: Create temporary filtered lists of maps, models, etc., from
self.classified_filesbased on thefile_to_base_name_mapfor thecurrent_asset_name. - Determine Metadata: Call
_determine_single_asset_metadata(current_asset_name, filtered_files)-> Get category/archetype for this asset. Store these along withcurrent_asset_nameand supplier name in a temporarycurrent_asset_metadatadict. - Skip Check: Perform the skip check logic specifically for
current_asset_nameusing theoutput_base_path, supplier name, andcurrent_asset_name. If skipped, update overall status andcontinueto the next asset name. - Process: Call
_process_maps(),_merge_maps(), passing the filtered file lists and potentially thecurrent_asset_metadata. These methods need to operate only on the provided subset of files. - Generate Metadata: Call
_generate_metadata_file(), passing thecurrent_asset_metadataand the results from map/merge processing for this asset. This method will now writemetadata.jsonspecific tocurrent_asset_name. - Organize Output: Call
_organize_output_files(), passing thecurrent_asset_name. This method needs modification:- It will move the processed files for the current asset to the correct subfolder (
<output_base>/<supplier>/<current_asset_name>/). - It will also identify files from the original input whose base name was
Nonein thefile_to_base_name_map(the "unmatched" files). - It will copy these "unmatched" files into the
Extra/subfolder for the current asset being processed in this loop iteration.
- It will move the processed files for the current asset to the correct subfolder (
- Update overall status based on the success/failure of this asset's processing.
- Log the start of processing for
_cleanup_workspace()(only after processing all assets from the input).- Return: Return the overall status dictionary summarizing results across all detected assets.
-
Adapt
_process_maps,_merge_maps,_generate_metadata_file,_organize_output_files:- Ensure these methods accept and use the filtered file lists and the specific
asset_namefor the current iteration. _organize_output_filesneeds the logic to handle copying the "unmatched" files into the current asset'sExtra/folder.
- Ensure these methods accept and use the filtered file lists and the specific
Phase 2: Update Orchestration (main.py, gui/processing_handler.py)
-
Modify
main.process_single_asset_wrapper:- The call
processor.process()will now return the overall status dictionary. - The wrapper needs to interpret this dictionary to return a single representative status ("processed" if any succeeded, "skipped" if all skipped, "failed" if any failed) and potentially a consolidated error message for the main loop/GUI.
- The call
-
Modify
gui.processing_handler.ProcessingHandler.run:- No major changes needed here, as it relies on
process_single_asset_wrapper. The status updates emitted back to the GUI might need slight adjustments if more detailed per-asset status is desired in the future, but for now, the overall status from the wrapper should suffice.
- No major changes needed here, as it relies on
Phase 3: Update GUI Prediction (asset_processor.py, gui/prediction_handler.py, gui/main_window.py)
-
Modify
AssetProcessor.get_detailed_file_predictions:- This method must now perform the multi-asset detection:
- Call the refactored
_determine_base_metadatato get thedistinct_base_namesandfile_to_base_name_map.
- Call the refactored
- Iterate through all classified files (maps, models, extra, ignored).
- For each file, look up its corresponding base name in the
file_to_base_name_map. - The returned dictionary for each file should now include:
original_path: strpredicted_asset_name: str | None (The base name determined for this file, or None if unmatched)predicted_output_name: str | None (The predicted final filename, e.g.,AssetName_Color_4K.png, or original name for models/extra)status: str ("Mapped", "Model", "Extra", "Unrecognised", "Ignored", "Unmatched Extra" - new status for files withNonebase name).details: str | None
- This method must now perform the multi-asset detection:
-
Update
gui.prediction_handler.PredictionHandler:- Ensure it correctly passes the results from
get_detailed_file_predictions(including the newpredicted_asset_nameandstatusvalues) back to the main window via signals.
- Ensure it correctly passes the results from
-
Update
gui.main_window.MainWindow:- Modify the preview table model/delegate to display the
predicted_asset_name. A new column might be needed. - Update the logic that colors rows or displays status icons to handle the new "Unmatched Extra" status distinctly from regular "Extra" or "Unrecognised".
- Modify the preview table model/delegate to display the
Visual Plan (AssetProcessor.process Sequence)
sequenceDiagram
participant Client as Orchestrator (main.py / GUI Handler)
participant AP as AssetProcessor
participant Config as Configuration
participant FS as File System
Client->>AP: process(input_path, config, output_base, overwrite)
AP->>AP: _setup_workspace()
AP->>FS: Create temp_dir
AP->>AP: _extract_input()
AP->>FS: Extract/Copy files to temp_dir
AP->>AP: _inventory_and_classify_files()
AP-->>AP: self.classified_files (all files)
AP->>AP: _determine_base_metadata()
AP-->>AP: distinct_base_names, file_to_base_name_map
AP->>AP: Initialize overall_status = {}
loop For each current_asset_name in distinct_base_names
AP->>AP: Log start for current_asset_name
AP->>AP: Filter self.classified_files using file_to_base_name_map
AP-->>AP: filtered_files_for_asset
AP->>AP: _determine_single_asset_metadata(current_asset_name, filtered_files_for_asset)
AP-->>AP: current_asset_metadata (category, archetype)
AP->>AP: Perform Skip Check for current_asset_name
alt Skip Check == True
AP->>AP: Update overall_status (skipped)
AP->>AP: continue loop
end
AP->>AP: _process_maps(filtered_files_for_asset, current_asset_metadata)
AP-->>AP: processed_map_details_asset
AP->>AP: _merge_maps(filtered_files_for_asset, current_asset_metadata)
AP-->>AP: merged_map_details_asset
AP->>AP: _generate_metadata_file(current_asset_metadata, processed_map_details_asset, merged_map_details_asset)
AP->>FS: Write metadata.json for current_asset_name
AP->>AP: _organize_output_files(current_asset_name, file_to_base_name_map)
AP->>FS: Move processed files for current_asset_name
AP->>FS: Copy unmatched files to Extra/ for current_asset_name
AP->>AP: Update overall_status (processed/failed for this asset)
end
AP->>AP: _cleanup_workspace()
AP->>FS: Delete temp_dir
AP-->>Client: Return overall_status dictionary