Asset-Frameworker/ProjectNotes/ProcessingEngineRefactorPlan.md

9.4 KiB

Project Plan: Modularizing the Asset Processing Engine

Last Updated: May 9, 2025

1. Project Vision & Goals

  • Vision: Transform the asset processing pipeline into a highly modular, extensible, and testable system.
  • Primary Goals:
    1. Decouple processing steps into independent, reusable stages.
    2. Simplify the addition of new processing capabilities (e.g., GLOSS > ROUGH conversion, Alpha to MASK, Normal Map Green Channel inversion).
    3. Improve code maintainability and readability.
    4. Enhance unit and integration testing capabilities for each processing component.
    5. Centralize common utility functions (image manipulation, path generation).

2. Proposed Architecture Overview

  • Core Concept: A PipelineOrchestrator will manage a sequence of ProcessingStages. Each stage will operate on an AssetProcessingContext object, which carries all necessary data and state for a single asset through the pipeline.
  • Key Components:
    • AssetProcessingContext: Data class holding asset-specific data, configuration, temporary paths, and status.
    • PipelineOrchestrator: Class to manage the overall processing flow for a SourceRule, iterating through assets and executing the pipeline of stages for each.
    • ProcessingStage (Base Class/Interface): Defines the contract for all individual processing stages (e.g., execute(context) method).
    • Specific Stage Classes: (e.g., SupplierDeterminationStage, IndividualMapProcessingStage, etc.)
    • Utility Modules: image_processing_utils.py, enhancements to utils/path_utils.py.

3. Proposed File Structure

  • processing/
    • pipeline/
      • __init__.py
      • asset_context.py (Defines AssetProcessingContext)
      • orchestrator.py (Defines PipelineOrchestrator)
      • stages/
        • __init__.py
        • base_stage.py (Defines ProcessingStage interface)
        • supplier_determination.py
        • asset_skip_logic.py
        • metadata_initialization.py
        • file_rule_filter.py
        • gloss_to_rough_conversion.py
        • alpha_extraction_to_mask.py
        • normal_map_green_channel.py
        • individual_map_processing.py
        • map_merging.py
        • metadata_finalization.py
        • output_organization.py
    • utils/
      • __init__.py
      • image_processing_utils.py (New module for image functions)
  • utils/ (Top-level existing directory)
    • path_utils.py (To be enhanced with sanitize_filename from processing_engine.py)

4. Detailed Phases and Tasks

Phase 0: Setup & Core Structures Definition Goal: Establish the foundational classes for the new pipeline.

  • Task 0.1: Define AssetProcessingContext
    • Create processing/pipeline/asset_context.py.
    • Define the AssetProcessingContext data class with fields: source_rule: SourceRule, asset_rule: AssetRule, workspace_path: Path, engine_temp_dir: Path, output_base_path: Path, effective_supplier: Optional[str], asset_metadata: Dict, processed_maps_details: Dict[str, Dict[str, Dict]], merged_maps_details: Dict[str, Dict[str, Dict]], files_to_process: List[FileRule], loaded_data_cache: Dict, config_obj: Configuration, status_flags: Dict, incrementing_value: Optional[str], sha5_value: Optional[str].
    • Ensure proper type hinting.
  • Task 0.2: Define ProcessingStage Base Class/Interface
    • Create processing/pipeline/stages/base_stage.py.
    • Define an abstract base class ProcessingStage with an abstract method execute(self, context: AssetProcessingContext) -> AssetProcessingContext.
  • Task 0.3: Implement Initial PipelineOrchestrator
    • Create processing/pipeline/orchestrator.py.
    • Define the PipelineOrchestrator class.
    • Implement __init__(self, config_obj: Configuration, stages: List[ProcessingStage]).
    • Implement process_source_rule(self, source_rule: SourceRule, workspace_path: Path, output_base_path: Path, overwrite: bool, incrementing_value: Optional[str], sha5_value: Optional[str]) -> Dict[str, List[str]].
      • Handles creation/cleanup of the main engine temporary directory.
      • Loops through source_rule.assets, initializes AssetProcessingContext for each.
      • Iterates self.stages, calling stage.execute(context).
      • Collects overall status.

Phase 1: Utility Module Refactoring Goal: Consolidate and centralize common utility functions.

  • Task 1.1: Refactor Path Utilities
    • Move _sanitize_filename from processing_engine.py to utils/path_utils.py.
    • Update uses to call the new utility function.
  • Task 1.2: Create image_processing_utils.py
    • Create processing/utils/image_processing_utils.py.
    • Move general-purpose image functions from processing_engine.py:
      • is_power_of_two
      • get_nearest_pot
      • calculate_target_dimensions
      • calculate_image_stats
      • normalize_aspect_ratio_change
      • Core image loading, BGR<>RGB conversion, generic resizing (from _load_and_transform_source).
      • Core data type conversion for saving, color conversion for saving, cv2.imwrite call (from _save_image).
    • Ensure functions are pure and testable.

Phase 2: Implementing Core Processing Stages (Migrating Existing Logic) Goal: Migrate existing functionalities from processing_engine.py into the new stage-based architecture. (For each task: create stage file, implement class, move logic, adapt to AssetProcessingContext)

  • Task 2.1: Implement SupplierDeterminationStage
  • Task 2.2: Implement AssetSkipLogicStage
  • Task 2.3: Implement MetadataInitializationStage
  • Task 2.4: Implement FileRuleFilterStage (New logic for item_type == "FILE_IGNORE")
  • Task 2.5: Implement IndividualMapProcessingStage (Adapts _process_individual_maps, uses image_processing_utils.py)
  • Task 2.6: Implement MapMergingStage (Adapts _merge_maps, uses image_processing_utils.py)
  • Task 2.7: Implement MetadataFinalizationAndSaveStage (Adapts _generate_metadata_file, uses utils.path_utils.generate_path_from_pattern)
  • Task 2.8: Implement OutputOrganizationStage (Adapts _organize_output_files)

Phase 3: Implementing New Feature Stages Goal: Add the new desired processing capabilities as distinct stages.

  • Task 3.1: Implement GlossToRoughConversionStage (Identify gloss, convert, invert, save temp, update FileRule)
  • Task 3.2: Implement AlphaExtractionToMaskStage (Check existing mask, find MAP_COL with alpha, extract, save temp, add new FileRule)
  • Task 3.3: Implement NormalMapGreenChannelStage (Identify normal maps, invert green based on config, save temp, update FileRule)

Phase 4: Integration, Testing & Finalization Goal: Assemble the pipeline, test thoroughly, and deprecate old code.

  • Task 4.1: Configure PipelineOrchestrator
    • Instantiate PipelineOrchestrator in main application logic with the ordered list of stage instances.
  • Task 4.2: Unit Testing
    • Unit tests for each ProcessingStage (mocking AssetProcessingContext).
    • Unit tests for image_processing_utils.py and utils/path_utils.py functions.
  • Task 4.3: Integration Testing
    • Test PipelineOrchestrator end-to-end with sample data.
    • Compare outputs with the existing engine for consistency.
  • Task 4.4: Documentation Update
    • Update developer documentation (e.g., Documentation/02_Developer_Guide/05_Processing_Pipeline.md).
    • Document AssetProcessingContext and stage responsibilities.
  • Task 4.5: Deprecate/Remove Old ProcessingEngine Code
    • Gradually remove refactored logic from processing_engine.py.

5. Workflow Diagram

graph TD
    AA[Load SourceRule & Config] --> BA(PipelineOrchestrator: process_source_rule);
    BA --> CA{For Each Asset in SourceRule};
    CA -- Yes --> DA(Orchestrator: Create AssetProcessingContext);
    DA --> EA(SupplierDeterminationStage);
    EA -- context --> FA(AssetSkipLogicStage);
    FA -- context --> GA{context.skip_asset?};
    GA -- Yes --> HA(Orchestrator: Record Skipped);
    HA --> CA;
    GA -- No --> IA(MetadataInitializationStage);
    IA -- context --> JA(FileRuleFilterStage);
    JA -- context --> KA(GlossToRoughConversionStage);
    KA -- context --> LA(AlphaExtractionToMaskStage);
    LA -- context --> MA(NormalMapGreenChannelStage);
    MA -- context --> NA(IndividualMapProcessingStage);
    NA -- context --> OA(MapMergingStage);
    OA -- context --> PA(MetadataFinalizationAndSaveStage);
    PA -- context --> QA(OutputOrganizationStage);
    QA -- context --> RA(Orchestrator: Record Processed/Failed);
    RA --> CA;
    CA -- No --> SA(Orchestrator: Cleanup Engine Temp Dir);
    SA --> TA[Processing Complete];

    subgraph Stages
        direction LR
        EA
        FA
        IA
        JA
        KA
        LA
        MA
        NA
        OA
        PA
        QA
    end

    subgraph Utils
        direction LR
        U1[image_processing_utils.py]
        U2[utils/path_utils.py]
    end

    NA -.-> U1;
    OA -.-> U1;
    KA -.-> U1;
    LA -.-> U1;
    MA -.-> U1;

    PA -.-> U2;
    QA -.-> U2;

    classDef context fill:#f9f,stroke:#333,stroke-width:2px;
    class DA,EA,FA,IA,JA,KA,LA,MA,NA,OA,PA,QA context;