Pipeline simplification - Needs testing!

2025-05-12 13:31:58 +02:00
parent 5bf53f036c
commit 4ffb2ff78c
7 changed files with 1046 additions and 1049 deletions
@@ -0,0 +1,154 @@
+# Revised Refactoring Plan: Processing Pipeline
+
+**Overall Goal:** To simplify the processing pipeline by refactoring the map merging process, consolidating map transformations (Gloss-to-Rough, Normal Green Invert), and creating a unified, configurable image saving utility. This plan aims to improve clarity, significantly reduce I/O by favoring in-memory operations, and make Power-of-Two (POT) scaling an optional, integrated step.
+
+**I. Map Merging Stage (`processing/pipeline/stages/map_merging.py`)**
+
+*   **Objective:** Transform this stage from performing merges to generating tasks for merged images.
+*   **Changes to `MapMergingStage.execute()`:**
+    1.  Iterate through `context.config_obj.map_merge_rules`.
+    2.  Identify required input map types and find their corresponding source file paths (potentially original paths or outputs of prior essential stages if any).
+    3.  Create "merged image tasks" and add them to `context.merged_image_tasks`.
+    4.  Each task entry will contain:
+        *   `output_map_type`: Target map type (e.g., "MAP_NRMRGH").
+        *   `input_map_sources`: Details of source map types and file paths.
+        *   `merge_rule_config`: Complete merge rule configuration (including fallback values).
+        *   `source_dimensions`: Dimensions for the high-resolution merged map basis.
+        *   `source_bit_depths`: Information about the bit depth of original source maps (needed for "respect_inputs" rule in save utility).
+
+**II. Individual Map Processing Stage (`processing/pipeline/stages/individual_map_processing.py`)**
+
+*   **Objective:** Adapt this stage to handle both individual raw maps and `merged_image_tasks`. It will perform necessary in-memory transformations (Gloss-to-Rough, Normal Green Invert) and prepare a single "high-resolution" source image (in memory) to be passed to the `UnifiedSaveUtility`.
+*   **Changes to `IndividualMapProcessingStage.execute()`:**
+    1.  **Input Handling Loop:** Iterate through `context.files_to_process` (regular maps) and `context.merged_image_tasks`.
+    2.  **Image Data Preparation:**
+        *   **For regular maps:** Load the source image file into memory (`current_image_data`). Determine `base_map_type` from the `FileRule`. Determine source bit depth.
+        *   **For `merged_image_tasks`:**
+            *   Attempt to load input map files specified in `input_map_sources`. If a file is missing, log a warning and generate placeholder data using fallback values from `merge_rule_config`. Handle other load errors.
+            *   Check dimensions of loaded/fallback data. Apply `MERGE_DIMENSION_MISMATCH_STRATEGY` (e.g., resize, log warning) or handle "ERROR_SKIP" strategy (log error, mark task failed, continue).
+            *   Perform the merge operation in memory according to `merge_rule_config`. Result is `current_image_data`. `base_map_type` is the task's `output_map_type`.
+    3.  **In-Memory Transformations:**
+        *   **Gloss-to-Rough Conversion:**
+            *   If `base_map_type` starts with "MAP_GLOSS":
+                *   Perform inversion on `current_image_data` (in memory).
+                *   Update `base_map_type` to "MAP_ROUGH".
+                *   Log the conversion.
+        *   **Normal Map Green Channel Inversion:**
+            *   If `base_map_type` is "NORMAL" *and* `context.config_obj.general_settings.invert_normal_map_green_channel_globally` is true:
+                *   Perform green channel inversion on `current_image_data` (in memory).
+                *   Log the inversion.
+    4.  **Optional Initial Scaling (POT or other):**
+        *   Check `INITIAL_SCALING_MODE` from config.
+        *   If `"POT_DOWNSCALE"`: Perform POT downscaling on `current_image_data` (in memory) -> `image_to_save`.
+        *   If `"NONE"`: `image_to_save` = `current_image_data`.
+        *   *(Note: `image_to_save` now reflects any prior transformations)*.
+    5.  **Color Management:** Apply necessary color management to `image_to_save`.
+    6.  **Pass to Save Utility:** Pass `image_to_save`, the (potentially updated) `base_map_type`, original source bit depth info (for "respect_inputs" rule), and other necessary details (like specific config values) to the `UnifiedSaveUtility`.
+    7.  **Remove Old Logic:** Remove old save logic, separate Gloss/Normal stage calls.
+    8.  **Context Update:** Update `context.processed_maps_details` with results from the `UnifiedSaveUtility`, including notes about any conversions/inversions performed or merge task failures.
+
+**III. Unified Image Save Utility (New file: `processing/utils/image_saving_utils.py`)**
+
+*   **Objective:** Centralize all image saving logic (resolution variants, format, bit depth, compression).
+*   **Interface (e.g., `save_image_variants` function):**
+    *   **Inputs:**
+        *   `source_image_data (np.ndarray)`: High-res image data (in memory, potentially transformed).
+        *   `base_map_type (str)`: Final map type (e.g., "COL", "ROUGH", "NORMAL", "MAP_NRMRGH").
+        *   `source_bit_depth_info (list)`: List of original source bit depth(s).
+        *   Specific config values (e.g., `image_resolutions: dict`, `file_type_defs: dict`, `output_format_8bit: str`, etc.).
+        *   `output_filename_pattern_tokens (dict)`.
+        *   `output_base_directory (Path)`.
+    *   **Core Functionality:**
+        1.  Use provided configuration inputs.
+        2.  Determine Target Bit Depth:
+            *   Use `bit_depth_rule` for `base_map_type` from `file_type_defs`.
+            *   If "force_8bit": target 8-bit.
+            *   If "respect_inputs": If `any(depth > 8 for depth in source_bit_depth_info)`, target 16-bit, else 8-bit.
+        3.  Determine Output File Format(s) (based on target bit depth, config).
+        4.  Generate and Save Resolution Variants:
+            *   Iterate through `image_resolutions`.
+            *   Resize `source_image_data` (in memory) for each variant (no upscaling).
+            *   Construct filename and path.
+            *   Prepare save parameters.
+            *   Convert variant data to target bit depth/color space just before saving.
+            *   Save variant using `cv2.imwrite` or similar.
+            *   Discard in-memory variant after saving.
+        5.  Return List of Saved File Details: `{'path': str, 'resolution_key': str, 'format': str, 'bit_depth': int, 'dimensions': (w,h)}`.
+    *   **Memory Management:** Holds `source_image_data` + one variant in memory at a time.
+
+**IV. Configuration Changes (`config/app_settings.json`)**
+
+1.  **Add/Confirm Settings:**
+    *   `"INITIAL_SCALING_MODE": "POT_DOWNSCALE"` (Options: "POT_DOWNSCALE", "NONE").
+    *   `"MERGE_DIMENSION_MISMATCH_STRATEGY": "USE_LARGEST"` (Options: "USE_LARGEST", "USE_FIRST", "ERROR_SKIP").
+    *   Ensure `general_settings.invert_normal_map_green_channel_globally` exists (boolean).
+2.  **Review/Confirm Existing Settings:**
+    *   Ensure `IMAGE_RESOLUTIONS`, `FILE_TYPE_DEFINITIONS` (`bit_depth_rule`), `MAP_MERGE_RULES` (`output_bit_depth`, fallback values), format settings, quality settings are comprehensive.
+3.  **Remove Obsolete Setting:**
+    *   `RESPECT_VARIANT_MAP_TYPES`.
+
+**V. Data Flow Diagram (Mermaid)**
+
+```mermaid
+graph TD
+    A[Start Asset Processing] --> B[File Rules Filter];
+    B --> STAGE_INDIVIDUAL_MAP_PROCESSING[Individual Map Processing Stage];
+
+    subgraph STAGE_INDIVIDUAL_MAP_PROCESSING [Individual Map Processing Stage]
+        direction LR
+        C1{Is it a regular map or merged task?}
+        C1 -- Regular Map --> C2[Load Source Image File into Memory (current_image_data)];
+        C1 -- Merged Task (from Map Merging Stage) --> C3[Load Inputs (Handle Missing w/ Fallbacks) & Merge in Memory (Handle Dim Mismatch) (current_image_data)];
+
+        C2 --> C4[current_image_data];
+        C3 --> C4;
+
+        C4 --> C4_TRANSFORM{Transformations?};
+        C4_TRANSFORM -- Gloss Map? --> C4a[Invert Data (in memory), Update base_map_type to ROUGH];
+        C4_TRANSFORM -- Normal Map & Invert Config? --> C4b[Invert Green Channel (in memory)];
+        C4_TRANSFORM -- No Transformation Needed --> C4_POST_TRANSFORM;
+        C4a --> C4_POST_TRANSFORM;
+        C4b --> C4_POST_TRANSFORM;
+
+        C4_POST_TRANSFORM[current_image_data (potentially transformed)] --> C5{INITIAL_SCALING_MODE};
+        C5 -- "POT_DOWNSCALE" --> C6[Perform POT Scale (in memory) --> image_to_save];
+        C5 -- "NONE" --> C7[image_to_save = current_image_data];
+
+        C6 --> C8[Apply Color Management to image_to_save (in memory)];
+        C7 --> C8;
+
+        C8 --> UNIFIED_SAVE_UTILITY[Call Unified Save Utility with image_to_save, final base_map_type, source bit depth info, config];
+    end
+
+    UNIFIED_SAVE_UTILITY --> H[Update context.processed_maps_details with list of saved files & notes];
+    H --> STAGE_METADATA_SAVE[Metadata Finalization & Save Stage];
+
+    STAGE_MAP_MERGING[Map Merging Stage] --> N{Identify Merge Rules};
+    N --> O[Create Merged Image Tasks (incl. inputs, config, source bit depths)];
+    O --> STAGE_INDIVIDUAL_MAP_PROCESSING; %% Feed tasks
+
+    A --> STAGE_OTHER_INITIAL[Other Initial Stages]
+    STAGE_OTHER_INITIAL --> STAGE_MAP_MERGING;
+
+    STAGE_METADATA_SAVE --> Z[End Asset Processing];
+
+    subgraph UNIFIED_SAVE_UTILITY_DETAILS [Unified Save Utility (processing.utils.image_saving_utils)]
+        direction TB
+        INPUTS[Input: in-memory image_to_save, final base_map_type, source_bit_depth_info, config_params, tokens, out_base_dir]
+        INPUTS --> CONFIG_LOAD[1. Use Provided Config Params]
+        CONFIG_LOAD --> DETERMINE_BIT_DEPTH[2. Determine Target Bit Depth (using rule & source_bit_depth_info)]
+        DETERMINE_BIT_DEPTH --> DETERMINE_FORMAT[3. Determine Output Format]
+        DETERMINE_FORMAT --> LOOP_VARIANTS[4. For each Resolution:]
+        LOOP_VARIANTS --> RESIZE_VARIANT[4a. Resize image_to_save to Variant (in memory)]
+        RESIZE_VARIANT --> PREPARE_SAVE[4b. Prepare Filename & Save Params]
+        PREPARE_SAVE --> SAVE_IMAGE[4c. Convert & Save Variant to Disk]
+        SAVE_IMAGE --> LOOP_VARIANTS;
+        LOOP_VARIANTS --> OUTPUT_LIST[5. Return List of Saved File Details]
+    end
+
+    style STAGE_INDIVIDUAL_MAP_PROCESSING fill:#f9f,stroke:#333,stroke-width:2px;
+    style STAGE_MAP_MERGING fill:#f9f,stroke:#333,stroke-width:2px;
+    style UNIFIED_SAVE_UTILITY fill:#ccf,stroke:#333,stroke-width:2px;
+    style UNIFIED_SAVE_UTILITY_DETAILS fill:#ccf,stroke:#333,stroke-width:1px,dashed;
+    style O fill:#lightgrey,stroke:#333,stroke-width:2px;
+    style C4_POST_TRANSFORM fill:#e6ffe6,stroke:#333,stroke-width:1px;
@@ -1,181 +0,0 @@
-# Project Plan: Modularizing the Asset Processing Engine
-
-**Last Updated:** May 9, 2025
-
-**1. Project Vision & Goals**
-
-*   **Vision:** Transform the asset processing pipeline into a highly modular, extensible, and testable system.
-*   **Primary Goals:**
-    1.  Decouple processing steps into independent, reusable stages.
-    2.  Simplify the addition of new processing capabilities (e.g., GLOSS > ROUGH conversion, Alpha to MASK, Normal Map Green Channel inversion).
-    3.  Improve code maintainability and readability.
-    4.  Enhance unit and integration testing capabilities for each processing component.
-    5.  Centralize common utility functions (image manipulation, path generation).
-
-**2. Proposed Architecture Overview**
-
-*   **Core Concept:** A `PipelineOrchestrator` will manage a sequence of `ProcessingStage`s. Each stage will operate on an `AssetProcessingContext` object, which carries all necessary data and state for a single asset through the pipeline.
-*   **Key Components:**
-    *   `AssetProcessingContext`: Data class holding asset-specific data, configuration, temporary paths, and status.
-    *   `PipelineOrchestrator`: Class to manage the overall processing flow for a `SourceRule`, iterating through assets and executing the pipeline of stages for each.
-    *   `ProcessingStage` (Base Class/Interface): Defines the contract for all individual processing stages (e.g., `execute(context)` method).
-    *   Specific Stage Classes: (e.g., `SupplierDeterminationStage`, `IndividualMapProcessingStage`, etc.)
-    *   Utility Modules: `image_processing_utils.py`, enhancements to `utils/path_utils.py`.
-
-**3. Proposed File Structure**
-
-*   `processing/`
-    *   `pipeline/`
-        *   `__init__.py`
-        *   `asset_context.py` (Defines `AssetProcessingContext`)
-        *   `orchestrator.py` (Defines `PipelineOrchestrator`)
-        *   `stages/`
-            *   `__init__.py`
-            *   `base_stage.py` (Defines `ProcessingStage` interface)
-            *   `supplier_determination.py`
-            *   `asset_skip_logic.py`
-            *   `metadata_initialization.py`
-            *   `file_rule_filter.py`
-            *   `gloss_to_rough_conversion.py`
-            *   `alpha_extraction_to_mask.py`
-            *   `normal_map_green_channel.py`
-            *   `individual_map_processing.py`
-            *   `map_merging.py`
-            *   `metadata_finalization.py`
-            *   `output_organization.py`
-    *   `utils/`
-        *   `__init__.py`
-        *   `image_processing_utils.py` (New module for image functions)
-*   `utils/` (Top-level existing directory)
-    *   `path_utils.py` (To be enhanced with `sanitize_filename` from `processing_engine.py`)
-
-**4. Detailed Phases and Tasks**
-
-**Phase 0: Setup & Core Structures Definition**
-*Goal: Establish the foundational classes for the new pipeline.*
-*   **Task 0.1: Define `AssetProcessingContext`**
-    *   Create `processing/pipeline/asset_context.py`.
-    *   Define the `AssetProcessingContext` data class with fields: `source_rule: SourceRule`, `asset_rule: AssetRule`, `workspace_path: Path`, `engine_temp_dir: Path`, `output_base_path: Path`, `effective_supplier: Optional[str]`, `asset_metadata: Dict`, `processed_maps_details: Dict[str, Dict[str, Dict]]`, `merged_maps_details: Dict[str, Dict[str, Dict]]`, `files_to_process: List[FileRule]`, `loaded_data_cache: Dict`, `config_obj: Configuration`, `status_flags: Dict`, `incrementing_value: Optional[str]`, `sha5_value: Optional[str]`.
-    *   Ensure proper type hinting.
-*   **Task 0.2: Define `ProcessingStage` Base Class/Interface**
-    *   Create `processing/pipeline/stages/base_stage.py`.
-    *   Define an abstract base class `ProcessingStage` with an abstract method `execute(self, context: AssetProcessingContext) -> AssetProcessingContext`.
-*   **Task 0.3: Implement Initial `PipelineOrchestrator`**
-    *   Create `processing/pipeline/orchestrator.py`.
-    *   Define the `PipelineOrchestrator` class.
-    *   Implement `__init__(self, config_obj: Configuration, stages: List[ProcessingStage])`.
-    *   Implement `process_source_rule(self, source_rule: SourceRule, workspace_path: Path, output_base_path: Path, overwrite: bool, incrementing_value: Optional[str], sha5_value: Optional[str]) -> Dict[str, List[str]]`.
-        *   Handles creation/cleanup of the main engine temporary directory.
-        *   Loops through `source_rule.assets`, initializes `AssetProcessingContext` for each.
-        *   Iterates `self.stages`, calling `stage.execute(context)`.
-        *   Collects overall status.
-
-**Phase 1: Utility Module Refactoring**
-*Goal: Consolidate and centralize common utility functions.*
-*   **Task 1.1: Refactor Path Utilities**
-    *   Move `_sanitize_filename` from `processing_engine.py` to `utils/path_utils.py`.
-    *   Update uses to call the new utility function.
-*   **Task 1.2: Create `image_processing_utils.py`**
-    *   Create `processing/utils/image_processing_utils.py`.
-    *   Move general-purpose image functions from `processing_engine.py`:
-        *   `is_power_of_two`
-        *   `get_nearest_pot`
-        *   `calculate_target_dimensions`
-        *   `calculate_image_stats`
-        *   `normalize_aspect_ratio_change`
-        *   Core image loading, BGR<>RGB conversion, generic resizing (from `_load_and_transform_source`).
-        *   Core data type conversion for saving, color conversion for saving, `cv2.imwrite` call (from `_save_image`).
-    *   Ensure functions are pure and testable.
-
-**Phase 2: Implementing Core Processing Stages (Migrating Existing Logic)**
-*Goal: Migrate existing functionalities from `processing_engine.py` into the new stage-based architecture.*
-(For each task: create stage file, implement class, move logic, adapt to `AssetProcessingContext`)
-*   **Task 2.1: Implement `SupplierDeterminationStage`**
-*   **Task 2.2: Implement `AssetSkipLogicStage`**
-*   **Task 2.3: Implement `MetadataInitializationStage`**
-*   **Task 2.4: Implement `FileRuleFilterStage`** (New logic for `item_type == "FILE_IGNORE"`)
-*   **Task 2.5: Implement `IndividualMapProcessingStage`** (Adapts `_process_individual_maps`, uses `image_processing_utils.py`)
-*   **Task 2.6: Implement `MapMergingStage`** (Adapts `_merge_maps`, uses `image_processing_utils.py`)
-*   **Task 2.7: Implement `MetadataFinalizationAndSaveStage`** (Adapts `_generate_metadata_file`, uses `utils.path_utils.generate_path_from_pattern`)
-*   **Task 2.8: Implement `OutputOrganizationStage`** (Adapts `_organize_output_files`)
-
-**Phase 3: Implementing New Feature Stages**
-*Goal: Add the new desired processing capabilities as distinct stages.*
-*   **Task 3.1: Implement `GlossToRoughConversionStage`** (Identify gloss, convert, invert, save temp, update `FileRule`)
-*   **Task 3.2: Implement `AlphaExtractionToMaskStage`** (Check existing mask, find MAP_COL with alpha, extract, save temp, add new `FileRule`)
-*   **Task 3.3: Implement `NormalMapGreenChannelStage`** (Identify normal maps, invert green based on config, save temp, update `FileRule`)
-
-**Phase 4: Integration, Testing & Finalization**
-*Goal: Assemble the pipeline, test thoroughly, and deprecate old code.*
-*   **Task 4.1: Configure `PipelineOrchestrator`**
-    *   Instantiate `PipelineOrchestrator` in main application logic with the ordered list of stage instances.
-*   **Task 4.2: Unit Testing**
-    *   Unit tests for each `ProcessingStage` (mocking `AssetProcessingContext`).
-    *   Unit tests for `image_processing_utils.py` and `utils/path_utils.py` functions.
-*   **Task 4.3: Integration Testing**
-    *   Test `PipelineOrchestrator` end-to-end with sample data.
-    *   Compare outputs with the existing engine for consistency.
-*   **Task 4.4: Documentation Update**
-    *   Update developer documentation (e.g., `Documentation/02_Developer_Guide/05_Processing_Pipeline.md`).
-    *   Document `AssetProcessingContext` and stage responsibilities.
-*   **Task 4.5: Deprecate/Remove Old `ProcessingEngine` Code**
-    *   Gradually remove refactored logic from `processing_engine.py`.
-
-**5. Workflow Diagram**
-
-```mermaid
-graph TD
-    AA[Load SourceRule & Config] --> BA(PipelineOrchestrator: process_source_rule);
-    BA --> CA{For Each Asset in SourceRule};
-    CA -- Yes --> DA(Orchestrator: Create AssetProcessingContext);
-    DA --> EA(SupplierDeterminationStage);
-    EA -- context --> FA(AssetSkipLogicStage);
-    FA -- context --> GA{context.skip_asset?};
-    GA -- Yes --> HA(Orchestrator: Record Skipped);
-    HA --> CA;
-    GA -- No --> IA(MetadataInitializationStage);
-    IA -- context --> JA(FileRuleFilterStage);
-    JA -- context --> KA(GlossToRoughConversionStage);
-    KA -- context --> LA(AlphaExtractionToMaskStage);
-    LA -- context --> MA(NormalMapGreenChannelStage);
-    MA -- context --> NA(IndividualMapProcessingStage);
-    NA -- context --> OA(MapMergingStage);
-    OA -- context --> PA(MetadataFinalizationAndSaveStage);
-    PA -- context --> QA(OutputOrganizationStage);
-    QA -- context --> RA(Orchestrator: Record Processed/Failed);
-    RA --> CA;
-    CA -- No --> SA(Orchestrator: Cleanup Engine Temp Dir);
-    SA --> TA[Processing Complete];
-
-    subgraph Stages
-        direction LR
-        EA
-        FA
-        IA
-        JA
-        KA
-        LA
-        MA
-        NA
-        OA
-        PA
-        QA
-    end
-
-    subgraph Utils
-        direction LR
-        U1[image_processing_utils.py]
-        U2[utils/path_utils.py]
-    end
-
-    NA -.-> U1;
-    OA -.-> U1;
-    KA -.-> U1;
-    LA -.-> U1;
-    MA -.-> U1;
-
-    PA -.-> U2;
-    QA -.-> U2;
-
-    classDef context fill:#f9f,stroke:#333,stroke-width:2px;
-    class DA,EA,FA,IA,JA,KA,LA,MA,NA,OA,PA,QA context;