Asset-Frameworker/ProjectNotes/PipelineRefactoringPlan.md

72 lines
4.2 KiB
Markdown

# Processing Pipeline Refactoring Plan
## 1. Problem Summary
The current processing pipeline, particularly the `IndividualMapProcessingStage`, exhibits maintainability challenges:
* **High Complexity:** The stage handles too many responsibilities (loading, merging, transformations, scaling, saving).
* **Duplicated Logic:** Image transformations (Gloss-to-Rough, Normal Green Invert) are duplicated within the stage instead of relying solely on dedicated stages or being handled consistently.
* **Tight Coupling:** Heavy reliance on the large, mutable `AssetProcessingContext` object creates implicit dependencies and makes isolated testing difficult.
## 2. Refactoring Goals
* Improve code readability and understanding.
* Enhance maintainability by localizing changes and removing duplication.
* Increase testability through smaller, focused components with clear interfaces.
* Clarify data dependencies between pipeline stages.
* Adhere more closely to the Single Responsibility Principle (SRP).
## 3. Proposed New Pipeline Stages
Replace the existing `IndividualMapProcessingStage` with the following sequence of smaller, focused stages, executed by the `PipelineOrchestrator` for each processing item:
1. **`PrepareProcessingItemsStage`:**
* **Responsibility:** Identifies and lists all items (`FileRule`, `MergeTaskDefinition`) to be processed from the main context.
* **Output:** Updates `context.processing_items`.
2. **`RegularMapProcessorStage`:** (Handles `FileRule` items)
* **Responsibility:** Loads source image, determines internal map type (with suffix), applies relevant transformations (Gloss-to-Rough, Normal Green Invert), determines original metadata.
* **Output:** `ProcessedRegularMapData` object containing transformed image data and metadata.
3. **`MergedTaskProcessorStage`:** (Handles `MergeTaskDefinition` items)
* **Responsibility:** Loads input images, applies transformations to inputs, handles fallbacks/resizing, performs merge operation.
* **Output:** `ProcessedMergedMapData` object containing merged image data and metadata.
4. **`InitialScalingStage`:** (Optional)
* **Responsibility:** Applies configured scaling (e.g., POT downscale) to the processed image data received from the previous stage.
* **Output:** Scaled image data.
5. **`SaveVariantsStage`:**
* **Responsibility:** Takes the final processed (and potentially scaled) image data and orchestrates saving variants using the `save_image_variants` utility.
* **Output:** List of saved file details (`saved_files_details`).
## 4. Proposed Data Flow
* **Input/Output Objects:** Key stages (`RegularMapProcessor`, `MergedTaskProcessor`, `InitialScaling`, `SaveVariants`) will use specific Input and Output dataclasses for clearer interfaces.
* **Orchestrator Role:** The `PipelineOrchestrator` manages the overall flow. It calls stages, passes necessary data (extracting image data references and metadata from previous stage outputs to create inputs for the next), receives output objects, and integrates final results (like saved file details) back into the main `AssetProcessingContext`.
* **Image Data Handling:** Large image arrays (`np.ndarray`) are passed primarily via stage return values (Output objects) and used as inputs to subsequent stages, managed by the Orchestrator. They are not stored long-term in the main `AssetProcessingContext`.
* **Main Context:** The `AssetProcessingContext` remains for overall state (rules, paths, configuration access, final status tracking) and potentially for simpler stages with minimal side effects.
## 5. Visualization (Conceptual)
```mermaid
graph TD
subgraph Proposed Pipeline Stages
Start --> Prep[PrepareProcessingItemsStage]
Prep --> ItemLoop{Loop per Item}
ItemLoop -- FileRule --> RegProc[RegularMapProcessorStage]
ItemLoop -- MergeTask --> MergeProc[MergedTaskProcessorStage]
RegProc --> Scale(InitialScalingStage)
MergeProc --> Scale
Scale --> Save[SaveVariantsStage]
Save --> UpdateContext[Update Main Context w/ Results]
UpdateContext --> ItemLoop
end
```
## 6. Benefits
* Improved Readability & Understanding.
* Enhanced Maintainability & Reduced Risk.
* Better Testability.
* Clearer Dependencies.