Asset-Frameworker/ProjectNotes/Data_Flow_Refinement_Plan.md
Rusfort 6971b8189f Data Flow Overhaul
Known regressions in current commit:
- No "extra" files
- GLOSS map does not look corrected
- "override" flag is not respected
2025-05-01 09:13:20 +02:00

124 lines
8.8 KiB
Markdown

# Architectural Plan: Data Flow Refinement (v3)
**Date:** 2025-04-30
**Author:** Roo (Architect Mode)
**Status:** Approved
## 1. Goal
Refine the application's data flow to establish the GUI as the single source of truth for processing rules. This involves moving prediction/preset logic upstream from the backend processor and ensuring the backend receives a *complete* `SourceRule` object for processing, thereby simplifying the processor itself. This version of the plan involves creating a new processing module (`processing_engine.py`) instead of refactoring the existing `asset_processor.py`.
## 2. Proposed Data Flow
The refined data flow centralizes rule generation and modification within the GUI components before passing a complete, explicit rule set to the backend. The `SourceRule` object structure serves as a consistent data contract throughout the pipeline.
```mermaid
sequenceDiagram
participant User
participant GUI_MainWindow as GUI (main_window.py)
participant GUI_Predictor as Predictor (prediction_handler.py)
participant GUI_UnifiedView as Unified View (unified_view_model.py)
participant Main as main.py
participant ProcessingEngine as New Backend (processing_engine.py)
participant Config as config.py
User->>+GUI_MainWindow: Selects Input & Preset
Note over GUI_MainWindow: Scans input, gets file list
GUI_MainWindow->>+GUI_Predictor: Request Prediction(File List, Preset Name, Input ID)
GUI_Predictor->>+Config: Load Preset Rules & Canonical Types
Config-->>-GUI_Predictor: Return Rules & Types
%% Prediction Logic (Internal to Predictor)
Note over GUI_Predictor: Perform file analysis (based on list), apply preset rules, generate COMPLETE SourceRule hierarchy (only overridable fields populated)
GUI_Predictor-->>-GUI_MainWindow: Return List[SourceRule] (Initial Rules)
GUI_MainWindow->>+GUI_UnifiedView: Populate View(List[SourceRule])
GUI_UnifiedView->>+Config: Read Allowed Asset/File Types for Dropdowns
Config-->>-GUI_UnifiedView: Return Allowed Types
Note over GUI_UnifiedView: Display rules, allow user edits
User->>GUI_UnifiedView: Modifies Rules (Overrides)
GUI_UnifiedView-->>GUI_MainWindow: Update SourceRule Objects in Memory
User->>+GUI_MainWindow: Trigger Processing
GUI_MainWindow->>+Main: Send Final List[SourceRule]
Main->>+ProcessingEngine: Queue Task(SourceRule) for each input
Note over ProcessingEngine: Execute processing based *solely* on the provided SourceRule and static config. No internal prediction/fallback.
ProcessingEngine-->>-Main: Processing Result
Main-->>-GUI_MainWindow: Update Status
GUI_MainWindow-->>User: Show Result/Status
```
## 3. Module-Specific Changes
* **`config.py`:**
* **Add Canonical Lists:** Introduce `ALLOWED_ASSET_TYPES` (e.g., `["Surface", "Model", "Decal", "Atlas", "UtilityMap"]`) and `ALLOWED_FILE_TYPES` (e.g., `["MAP_COL", "MAP_NRM", ..., "MODEL", "EXTRA", "FILE_IGNORE"]`).
* **Purpose:** Single source of truth for GUI dropdowns and validation.
* **Existing Config:** Retains static definitions like `IMAGE_RESOLUTIONS`, `MAP_MERGE_RULES`, `JPG_QUALITY`, etc.
* **`rule_structure.py`:**
* **Remove Enums:** Remove `AssetType` and `ItemType` Enums. Update `AssetRule.asset_type`, `FileRule.item_type_override`, etc., to use string types validated against `config.py` lists.
* **Field Retention:** Keep `FileRule.resolution_override` and `FileRule.channel_merge_instructions` fields for structural consistency, but they will not be populated or used for overrides in this flow.
* **`gui/prediction_handler.py` (or equivalent):**
* **Enhance Prediction Logic:** Modify `run_prediction` method.
* **Input:** Accept `input_source_identifier` (string), `file_list` (List[str] of relative paths), and `preset_name` (string) when called from GUI.
* **Load Config:** Read `ALLOWED_ASSET_TYPES`, `ALLOWED_FILE_TYPES`, and preset rules.
* **Relocate Classification:** Integrate classification/naming logic (previously in `asset_processor.py`) to operate on the provided `file_list`.
* **Generate Complete Rules:** Populate `SourceRule`, `AssetRule`, and `FileRule` objects.
* Set initial values only for *overridable* fields (e.g., `asset_type`, `item_type_override`, `target_asset_name_override`, `supplier_identifier`, `output_format_override`) based on preset rules/defaults.
* Explicitly **do not** populate static config fields like `FileRule.resolution_override` or `FileRule.channel_merge_instructions`.
* **Temporary Files (If needed for non-GUI):** May need logic later to handle direct path inputs (CLI/Docker) involving temporary extraction/cleanup, but the primary GUI flow uses the provided list.
* **Output:** Emit `rule_hierarchy_ready` signal with the `List[SourceRule]`.
* **NEW: `processing_engine.py` (New Module):**
* **Purpose:** Contains a new class (e.g., `ProcessingEngine`) for executing the processing pipeline based solely on a complete `SourceRule` and static configuration. Replaces `asset_processor.py` in the main workflow.
* **Initialization (`__init__`):** Takes the static `Configuration` object as input.
* **Core Method (`process`):** Accepts a single, complete `SourceRule` object. Orchestrates processing steps (workspace setup, extraction, map processing, merging, metadata, organization, cleanup).
* **Helper Methods (Refactored Logic):** Implement simplified versions of processing helpers (e.g., `_process_individual_maps`, `_merge_maps_from_source`, `_generate_metadata_file`, `_organize_output_files`, `_load_and_transform_source`, `_save_image`).
* Retrieve *overridable* parameters directly from the input `SourceRule`.
* Retrieve *static configuration* parameters (resolutions, merge rules) **only** from the stored `Configuration` object.
* Contain **no** prediction, classification, or fallback logic.
* **Dependencies:** `rule_structure.py`, `configuration.py`, `config.py`, cv2, numpy, etc.
* **`asset_processor.py` (Old Module):**
* **Status:** Remains in the codebase **unchanged** for reference.
* **Usage:** No longer called by `main.py` or GUI for standard processing.
* **`gui/main_window.py`:**
* **Scan Input:** Perform initial directory/archive scan to get the file list for each directory/archieve.
* **Initiate Prediction:** Call `PredictionHandler` with the file list, preset, and input identifier.
* **Receive/Pass Rules:** Handle `rule_hierarchy_ready`, pass `SourceRule` list to `UnifiedViewModel`.
* **Send Final Rules:** Send the final `SourceRule` list to `main.py`.
* **`gui/unified_view_model.py` / `gui/delegates.py`:**
* **Load Dropdown Options:** Source dropdowns (`AssetType`, `ItemType`) from `config.py`.
* **Data Handling:** Read/write user modifications to overridable fields in `SourceRule` objects.
* **No UI for Static Config:** Do not provide UI editing for resolution or merge instructions.
* **`main.py`:**
* **Receive Rule List:** Accept `List[SourceRule]` from GUI.
* **Instantiate New Engine:** Import and instantiate the new `ProcessingEngine` from `processing_engine.py`.
* **Queue Tasks:** Iterate `SourceRule` list, queue tasks.
* **Call New Engine:** Pass the individual `SourceRule` object to `ProcessingEngine.process` for each task.
## 4. Rationale / Benefits
* **Single Source of Truth:** GUI holds the final `SourceRule` objects.
* **Backend Simplification:** New `processing_engine.py` is focused solely on execution based on explicit rules and static config.
* **Decoupling:** Reduced coupling between GUI/prediction and backend processing.
* **Clarity:** Clearer data flow and component responsibilities.
* **Maintainability:** Easier maintenance and debugging.
* **Centralized Definitions:** `config.py` centralizes allowed types.
* **Preserves Reference:** Keeps `asset_processor.py` available for comparison.
* **Consistent Data Contract:** `SourceRule` structure is consistent from predictor output to engine input, enabling potential GUI bypass.
## 5. Potential Issues / Considerations
* **`PredictionHandler` Complexity:** Will require careful implementation of classification/rule population logic.
* **Performance:** Prediction logic needs to remain performant (threading).
* **Rule Structure Completeness:** Ensure `SourceRule` dataclasses hold all necessary *overridable* fields.
* **Preset Loading:** Robust preset loading/interpretation needed in `PredictionHandler`.
* **Static Config Loading:** Ensure the new `ProcessingEngine` correctly loads and uses the static `Configuration` object.
## 6. Documentation
This document (`ProjectNotes/Data_Flow_Refinement_Plan.md`) serves as the architectural plan. Relevant sections of the Developer Guide will need updating upon implementation.