Asset-Frameworker/ProjectNotes/PipelineRefactoringPlan.md

4.2 KiB

Processing Pipeline Refactoring Plan

1. Problem Summary

The current processing pipeline, particularly the IndividualMapProcessingStage, exhibits maintainability challenges:

  • High Complexity: The stage handles too many responsibilities (loading, merging, transformations, scaling, saving).
  • Duplicated Logic: Image transformations (Gloss-to-Rough, Normal Green Invert) are duplicated within the stage instead of relying solely on dedicated stages or being handled consistently.
  • Tight Coupling: Heavy reliance on the large, mutable AssetProcessingContext object creates implicit dependencies and makes isolated testing difficult.

2. Refactoring Goals

  • Improve code readability and understanding.
  • Enhance maintainability by localizing changes and removing duplication.
  • Increase testability through smaller, focused components with clear interfaces.
  • Clarify data dependencies between pipeline stages.
  • Adhere more closely to the Single Responsibility Principle (SRP).

3. Proposed New Pipeline Stages

Replace the existing IndividualMapProcessingStage with the following sequence of smaller, focused stages, executed by the PipelineOrchestrator for each processing item:

  1. PrepareProcessingItemsStage:

    • Responsibility: Identifies and lists all items (FileRule, MergeTaskDefinition) to be processed from the main context.
    • Output: Updates context.processing_items.
  2. RegularMapProcessorStage: (Handles FileRule items)

    • Responsibility: Loads source image, determines internal map type (with suffix), applies relevant transformations (Gloss-to-Rough, Normal Green Invert), determines original metadata.
    • Output: ProcessedRegularMapData object containing transformed image data and metadata.
  3. MergedTaskProcessorStage: (Handles MergeTaskDefinition items)

    • Responsibility: Loads input images, applies transformations to inputs, handles fallbacks/resizing, performs merge operation.
    • Output: ProcessedMergedMapData object containing merged image data and metadata.
  4. InitialScalingStage: (Optional)

    • Responsibility: Applies configured scaling (e.g., POT downscale) to the processed image data received from the previous stage.
    • Output: Scaled image data.
  5. SaveVariantsStage:

    • Responsibility: Takes the final processed (and potentially scaled) image data and orchestrates saving variants using the save_image_variants utility.
    • Output: List of saved file details (saved_files_details).

4. Proposed Data Flow

  • Input/Output Objects: Key stages (RegularMapProcessor, MergedTaskProcessor, InitialScaling, SaveVariants) will use specific Input and Output dataclasses for clearer interfaces.
  • Orchestrator Role: The PipelineOrchestrator manages the overall flow. It calls stages, passes necessary data (extracting image data references and metadata from previous stage outputs to create inputs for the next), receives output objects, and integrates final results (like saved file details) back into the main AssetProcessingContext.
  • Image Data Handling: Large image arrays (np.ndarray) are passed primarily via stage return values (Output objects) and used as inputs to subsequent stages, managed by the Orchestrator. They are not stored long-term in the main AssetProcessingContext.
  • Main Context: The AssetProcessingContext remains for overall state (rules, paths, configuration access, final status tracking) and potentially for simpler stages with minimal side effects.

5. Visualization (Conceptual)

graph TD
    subgraph Proposed Pipeline Stages
        Start --> Prep[PrepareProcessingItemsStage]
        Prep --> ItemLoop{Loop per Item}
        ItemLoop -- FileRule --> RegProc[RegularMapProcessorStage]
        ItemLoop -- MergeTask --> MergeProc[MergedTaskProcessorStage]
        RegProc --> Scale(InitialScalingStage)
        MergeProc --> Scale
        Scale --> Save[SaveVariantsStage]
        Save --> UpdateContext[Update Main Context w/ Results]
        UpdateContext --> ItemLoop
    end

6. Benefits

  • Improved Readability & Understanding.
  • Enhanced Maintainability & Reduced Risk.
  • Better Testability.
  • Clearer Dependencies.