Asset-Frameworker/ProjectNotes/ProcessingEngineRefactorPlan.md

181 lines
9.4 KiB
Markdown

# Project Plan: Modularizing the Asset Processing Engine
**Last Updated:** May 9, 2025
**1. Project Vision & Goals**
* **Vision:** Transform the asset processing pipeline into a highly modular, extensible, and testable system.
* **Primary Goals:**
1. Decouple processing steps into independent, reusable stages.
2. Simplify the addition of new processing capabilities (e.g., GLOSS > ROUGH conversion, Alpha to MASK, Normal Map Green Channel inversion).
3. Improve code maintainability and readability.
4. Enhance unit and integration testing capabilities for each processing component.
5. Centralize common utility functions (image manipulation, path generation).
**2. Proposed Architecture Overview**
* **Core Concept:** A `PipelineOrchestrator` will manage a sequence of `ProcessingStage`s. Each stage will operate on an `AssetProcessingContext` object, which carries all necessary data and state for a single asset through the pipeline.
* **Key Components:**
* `AssetProcessingContext`: Data class holding asset-specific data, configuration, temporary paths, and status.
* `PipelineOrchestrator`: Class to manage the overall processing flow for a `SourceRule`, iterating through assets and executing the pipeline of stages for each.
* `ProcessingStage` (Base Class/Interface): Defines the contract for all individual processing stages (e.g., `execute(context)` method).
* Specific Stage Classes: (e.g., `SupplierDeterminationStage`, `IndividualMapProcessingStage`, etc.)
* Utility Modules: `image_processing_utils.py`, enhancements to `utils/path_utils.py`.
**3. Proposed File Structure**
* `processing/`
* `pipeline/`
* `__init__.py`
* `asset_context.py` (Defines `AssetProcessingContext`)
* `orchestrator.py` (Defines `PipelineOrchestrator`)
* `stages/`
* `__init__.py`
* `base_stage.py` (Defines `ProcessingStage` interface)
* `supplier_determination.py`
* `asset_skip_logic.py`
* `metadata_initialization.py`
* `file_rule_filter.py`
* `gloss_to_rough_conversion.py`
* `alpha_extraction_to_mask.py`
* `normal_map_green_channel.py`
* `individual_map_processing.py`
* `map_merging.py`
* `metadata_finalization.py`
* `output_organization.py`
* `utils/`
* `__init__.py`
* `image_processing_utils.py` (New module for image functions)
* `utils/` (Top-level existing directory)
* `path_utils.py` (To be enhanced with `sanitize_filename` from `processing_engine.py`)
**4. Detailed Phases and Tasks**
**Phase 0: Setup & Core Structures Definition**
*Goal: Establish the foundational classes for the new pipeline.*
* **Task 0.1: Define `AssetProcessingContext`**
* Create `processing/pipeline/asset_context.py`.
* Define the `AssetProcessingContext` data class with fields: `source_rule: SourceRule`, `asset_rule: AssetRule`, `workspace_path: Path`, `engine_temp_dir: Path`, `output_base_path: Path`, `effective_supplier: Optional[str]`, `asset_metadata: Dict`, `processed_maps_details: Dict[str, Dict[str, Dict]]`, `merged_maps_details: Dict[str, Dict[str, Dict]]`, `files_to_process: List[FileRule]`, `loaded_data_cache: Dict`, `config_obj: Configuration`, `status_flags: Dict`, `incrementing_value: Optional[str]`, `sha5_value: Optional[str]`.
* Ensure proper type hinting.
* **Task 0.2: Define `ProcessingStage` Base Class/Interface**
* Create `processing/pipeline/stages/base_stage.py`.
* Define an abstract base class `ProcessingStage` with an abstract method `execute(self, context: AssetProcessingContext) -> AssetProcessingContext`.
* **Task 0.3: Implement Initial `PipelineOrchestrator`**
* Create `processing/pipeline/orchestrator.py`.
* Define the `PipelineOrchestrator` class.
* Implement `__init__(self, config_obj: Configuration, stages: List[ProcessingStage])`.
* Implement `process_source_rule(self, source_rule: SourceRule, workspace_path: Path, output_base_path: Path, overwrite: bool, incrementing_value: Optional[str], sha5_value: Optional[str]) -> Dict[str, List[str]]`.
* Handles creation/cleanup of the main engine temporary directory.
* Loops through `source_rule.assets`, initializes `AssetProcessingContext` for each.
* Iterates `self.stages`, calling `stage.execute(context)`.
* Collects overall status.
**Phase 1: Utility Module Refactoring**
*Goal: Consolidate and centralize common utility functions.*
* **Task 1.1: Refactor Path Utilities**
* Move `_sanitize_filename` from `processing_engine.py` to `utils/path_utils.py`.
* Update uses to call the new utility function.
* **Task 1.2: Create `image_processing_utils.py`**
* Create `processing/utils/image_processing_utils.py`.
* Move general-purpose image functions from `processing_engine.py`:
* `is_power_of_two`
* `get_nearest_pot`
* `calculate_target_dimensions`
* `calculate_image_stats`
* `normalize_aspect_ratio_change`
* Core image loading, BGR<>RGB conversion, generic resizing (from `_load_and_transform_source`).
* Core data type conversion for saving, color conversion for saving, `cv2.imwrite` call (from `_save_image`).
* Ensure functions are pure and testable.
**Phase 2: Implementing Core Processing Stages (Migrating Existing Logic)**
*Goal: Migrate existing functionalities from `processing_engine.py` into the new stage-based architecture.*
(For each task: create stage file, implement class, move logic, adapt to `AssetProcessingContext`)
* **Task 2.1: Implement `SupplierDeterminationStage`**
* **Task 2.2: Implement `AssetSkipLogicStage`**
* **Task 2.3: Implement `MetadataInitializationStage`**
* **Task 2.4: Implement `FileRuleFilterStage`** (New logic for `item_type == "FILE_IGNORE"`)
* **Task 2.5: Implement `IndividualMapProcessingStage`** (Adapts `_process_individual_maps`, uses `image_processing_utils.py`)
* **Task 2.6: Implement `MapMergingStage`** (Adapts `_merge_maps`, uses `image_processing_utils.py`)
* **Task 2.7: Implement `MetadataFinalizationAndSaveStage`** (Adapts `_generate_metadata_file`, uses `utils.path_utils.generate_path_from_pattern`)
* **Task 2.8: Implement `OutputOrganizationStage`** (Adapts `_organize_output_files`)
**Phase 3: Implementing New Feature Stages**
*Goal: Add the new desired processing capabilities as distinct stages.*
* **Task 3.1: Implement `GlossToRoughConversionStage`** (Identify gloss, convert, invert, save temp, update `FileRule`)
* **Task 3.2: Implement `AlphaExtractionToMaskStage`** (Check existing mask, find MAP_COL with alpha, extract, save temp, add new `FileRule`)
* **Task 3.3: Implement `NormalMapGreenChannelStage`** (Identify normal maps, invert green based on config, save temp, update `FileRule`)
**Phase 4: Integration, Testing & Finalization**
*Goal: Assemble the pipeline, test thoroughly, and deprecate old code.*
* **Task 4.1: Configure `PipelineOrchestrator`**
* Instantiate `PipelineOrchestrator` in main application logic with the ordered list of stage instances.
* **Task 4.2: Unit Testing**
* Unit tests for each `ProcessingStage` (mocking `AssetProcessingContext`).
* Unit tests for `image_processing_utils.py` and `utils/path_utils.py` functions.
* **Task 4.3: Integration Testing**
* Test `PipelineOrchestrator` end-to-end with sample data.
* Compare outputs with the existing engine for consistency.
* **Task 4.4: Documentation Update**
* Update developer documentation (e.g., `Documentation/02_Developer_Guide/05_Processing_Pipeline.md`).
* Document `AssetProcessingContext` and stage responsibilities.
* **Task 4.5: Deprecate/Remove Old `ProcessingEngine` Code**
* Gradually remove refactored logic from `processing_engine.py`.
**5. Workflow Diagram**
```mermaid
graph TD
AA[Load SourceRule & Config] --> BA(PipelineOrchestrator: process_source_rule);
BA --> CA{For Each Asset in SourceRule};
CA -- Yes --> DA(Orchestrator: Create AssetProcessingContext);
DA --> EA(SupplierDeterminationStage);
EA -- context --> FA(AssetSkipLogicStage);
FA -- context --> GA{context.skip_asset?};
GA -- Yes --> HA(Orchestrator: Record Skipped);
HA --> CA;
GA -- No --> IA(MetadataInitializationStage);
IA -- context --> JA(FileRuleFilterStage);
JA -- context --> KA(GlossToRoughConversionStage);
KA -- context --> LA(AlphaExtractionToMaskStage);
LA -- context --> MA(NormalMapGreenChannelStage);
MA -- context --> NA(IndividualMapProcessingStage);
NA -- context --> OA(MapMergingStage);
OA -- context --> PA(MetadataFinalizationAndSaveStage);
PA -- context --> QA(OutputOrganizationStage);
QA -- context --> RA(Orchestrator: Record Processed/Failed);
RA --> CA;
CA -- No --> SA(Orchestrator: Cleanup Engine Temp Dir);
SA --> TA[Processing Complete];
subgraph Stages
direction LR
EA
FA
IA
JA
KA
LA
MA
NA
OA
PA
QA
end
subgraph Utils
direction LR
U1[image_processing_utils.py]
U2[utils/path_utils.py]
end
NA -.-> U1;
OA -.-> U1;
KA -.-> U1;
LA -.-> U1;
MA -.-> U1;
PA -.-> U2;
QA -.-> U2;
classDef context fill:#f9f,stroke:#333,stroke-width:2px;
class DA,EA,FA,IA,JA,KA,LA,MA,NA,OA,PA,QA context;