Dedicated LLM settings file - UNTESTED!

This commit is contained in:
Rusfort 2025-05-04 13:24:10 +02:00
parent 01c8f68ea0
commit 336d698f9b
7 changed files with 403 additions and 381 deletions

View File

@ -6,10 +6,11 @@ This document provides technical details about the configuration system and the
The tool utilizes a two-tiered configuration system managed by the `configuration.py` module:
1. **Application Settings (`config/app_settings.json`):** This JSON file defines the core global default settings, constants, and rules that apply generally across different asset sources. Examples include default output paths, standard image resolutions, map merge rules, output format rules, Blender executable paths, and default map types. It also centrally defines metadata for allowed asset and file types. Key sections include `FILE_TYPE_DEFINITIONS`, `ASSET_TYPE_DEFINITIONS`, and `MAP_MERGE_RULES`.
2. **Preset Files (`Presets/*.json`):** These JSON files define supplier-specific rules and overrides. They contain patterns (often regular expressions) to interpret filenames, classify map types, handle variants, define naming conventions, and specify other source-specific behaviors.
1. **Application Settings (`config/app_settings.json`):** This JSON file defines the core global default settings, constants, and rules that apply generally across different asset sources (e.g., default output paths, standard image resolutions, map merge rules, output format rules, Blender paths, `FILE_TYPE_DEFINITIONS`, `ASSET_TYPE_DEFINITIONS`). **LLM-specific settings are now located in `config/llm_settings.json`.**
2. **LLM Settings (`config/llm_settings.json`):** This JSON file contains settings specifically related to the LLM predictor, such as the API endpoint, model name, prompt template, and examples.
3. **Preset Files (`Presets/*.json`):** These JSON files define supplier-specific rules and overrides. They contain patterns to interpret filenames, classify map types, handle variants, define naming conventions, and specify other source-specific behaviors.
The `configuration.py` module is responsible for loading the base settings from `config/app_settings.json` (including loading and saving the JSON content), merging them with the rules from the selected preset file, and providing the base configuration via the `load_base_config()` function. Preset values generally override core settings where applicable. Note that the old `config.py` file has been deleted.
The `configuration.py` module is responsible for loading the base settings from `config/app_settings.json`, the LLM settings from `config/llm_settings.json`, merging the base settings with the rules from the selected preset file, and providing access to all settings via the `Configuration` class. The `load_base_config()` function is still available for accessing only the `app_settings.json` content directly (e.g., for the GUI editor). Preset values generally override core settings where applicable. Note that the old `config.py` file has been deleted.
## Supplier Management (`config/suppliers.json`)
@ -24,12 +25,13 @@ The `Configuration` class is central to the new configuration system. It is resp
* **Initialization:** An instance is created with a specific `preset_name`.
* **Loading:**
* It first loads the base application settings from `config/app_settings.json`. This file now also contains the LLM-specific settings (`llm_endpoint_url`, `llm_api_key`, `llm_model_name`, `llm_temperature`, `llm_request_timeout`, `llm_predictor_prompt`, `llm_predictor_examples`).
* It then loads the specified preset JSON file from the `Presets/` directory.
* **Merging:** The loaded settings from `app_settings.json` and the preset rules are merged into a single configuration object accessible via instance attributes. Preset values generally override the base settings from `app_settings.json` where applicable.
* **Validation (`_validate_configs`):** Performs basic structural validation on the loaded settings, checking for the presence of required keys and basic data types (e.g., ensuring `map_type_mapping` is a list of dictionaries).
* **Regex Compilation (`_compile_regex_patterns`):** A crucial step for performance. It iterates through the regex patterns defined in the merged configuration (from both `app_settings.json` and the preset) and compiles them using `re.compile` (mostly case-insensitive). These compiled regex objects are stored as instance attributes (e.g., `self.compiled_map_keyword_regex`) for fast matching during file classification. It uses a helper (`_fnmatch_to_regex`) for basic wildcard (`*`, `?`) conversion in patterns.
* **LLM Settings Access:** The `Configuration` class provides getter methods (e.g., `get_llm_endpoint_url()`, `get_llm_api_key()`, `get_llm_model_name()`, `get_llm_temperature()`, `get_llm_request_timeout()`, `get_llm_predictor_prompt()`, `get_llm_predictor_examples()`) to allow components like the `LLMPredictionHandler` to easily access the necessary LLM configuration values loaded from `app_settings.json`.
* It first loads the base application settings from `config/app_settings.json`.
* It then loads the LLM-specific settings from `config/llm_settings.json`.
* Finally, it loads the specified preset JSON file from the `Presets/` directory.
* **Merging & Access:** The base settings from `app_settings.json` are merged with the preset rules. LLM settings are stored separately. All settings are accessible via instance properties (e.g., `config.target_filename_pattern`, `config.llm_endpoint_url`). Preset values generally override the base settings where applicable.
* **Validation (`_validate_configs`):** Performs basic structural validation on the loaded settings (base, LLM, and preset), checking for the presence of required keys and basic data types. Logs warnings for missing optional LLM keys.
* **Regex Compilation (`_compile_regex_patterns`):** Compiles regex patterns defined in the merged configuration (from base settings and the preset) for performance. Compiled regex objects are stored as instance attributes (e.g., `self.compiled_map_keyword_regex`).
* **LLM Settings Access:** The `Configuration` class provides direct property access (e.g., `config.llm_endpoint_url`, `config.llm_api_key`, `config.llm_model_name`, `config.llm_temperature`, `config.llm_request_timeout`, `config.llm_predictor_prompt`, `config.get_llm_examples()`) to allow components like the `LLMPredictionHandler` to easily access the necessary LLM configuration values loaded from `config/llm_settings.json`.
An instance of `Configuration` is created within each worker process (`main.process_single_asset_wrapper`) to ensure that each concurrently processed asset uses the correct, isolated configuration based on the specified preset and the base application settings.

View File

@ -6,7 +6,7 @@ The LLM Predictor feature provides an alternative method for classifying asset t
## Configuration
The LLM Predictor is configured via new settings in the `config/app_settings.json` file. These settings control the behavior of the LLM interaction:
The LLM Predictor is configured via settings in the dedicated `config/llm_settings.json` file. These settings control the behavior of the LLM interaction:
- `llm_predictor_prompt`: The template for the prompt sent to the LLM. This prompt should guide the LLM to classify the asset based on its name and potentially other context. It can include placeholders that will be replaced with actual data during processing.
- `llm_endpoint_url`: The URL of the LLM API endpoint.
@ -16,7 +16,7 @@ The LLM Predictor is configured via new settings in the `config/app_settings.jso
- `llm_request_timeout`: The maximum time (in seconds) to wait for a response from the LLM API.
- `llm_predictor_examples`: A list of example input/output pairs to include in the prompt for few-shot learning, helping the LLM understand the desired output format and classification logic.
The prompt structure is crucial for effective classification. It should clearly instruct the LLM on the task and the expected output format. Placeholders within the prompt template (e.g., `{asset_name}`) are dynamically replaced with relevant data before the request is sent.
These settings are loaded by the `Configuration` class (from `configuration.py`) along with the core `app_settings.json` and the selected preset. The prompt structure is crucial for effective classification. It should clearly instruct the LLM on the task and the expected output format. Placeholders within the prompt template (e.g., `{FILE_LIST}`) are dynamically replaced with relevant data before the request is sent.
## Expected LLM Output Format (Refactored)
@ -64,9 +64,11 @@ The `gui/llm_prediction_handler.py` module contains the `LLMPredictionHandler` c
Key Responsibilities & Methods:
- **Initialization**: Takes the source identifier, file list, and `Configuration` object.
- **`run()`**: The main method executed by the thread pool. It prepares the prompt, calls the LLM (via `LLMInteractionHandler`), parses the response, and emits the result or error.
- **Interaction**: Uses `LLMInteractionHandler` to handle the actual prompt construction and API communication (details in `03_Key_Components.md` and `llm_interaction_handler.py`).
- **Initialization**: Takes the source identifier, file list, and the main `Configuration` object (which has loaded settings from `app_settings.json`, `llm_settings.json`, and the active preset).
- **`run()`**: The main method executed by the thread pool. It prepares the prompt, calls the LLM, parses the response, and emits the result or error.
- **Prompt Preparation (`_prepare_prompt`)**: Uses the `Configuration` object (`self.config`) to access the `llm_predictor_prompt`, `asset_type_definitions`, `file_type_definitions`, and `llm_examples` to build the final prompt string.
- **API Call (`_call_llm`)**: Uses the `Configuration` object (`self.config`) to get the `llm_endpoint_url`, `llm_api_key`, `llm_model_name`, `llm_temperature`, and `llm_request_timeout` to make the API request.
- **Parsing (`_parse_llm_response`)**: Parses the LLM's JSON response (using `self.config` again to get valid asset/file types for validation) and constructs the `SourceRule` hierarchy.
- **`_parse_llm_response(response_text)`**: This method contains the **new parsing logic**:
1. **Sanitization**: Removes common non-JSON elements like comments (`//`, `/* */`) and markdown code fences (```json ... ```) from the raw `response_text` to increase the likelihood of successful JSON parsing.
2. **JSON Parsing**: Parses the sanitized string into a Python dictionary.

View File

@ -263,268 +263,5 @@
],
"CALCULATE_STATS_RESOLUTION": "1K",
"DEFAULT_ASSET_CATEGORY": "Surface",
"TEMP_DIR_PREFIX": "_PROCESS_ASSET_",
"llm_predictor_examples": [
{
"input": "MessyTextures/Concrete_Damage_Set/concrete_col.png\nMessyTextures/Concrete_Damage_Set/concrete_N.png\nMessyTextures/Concrete_Damage_Set/concrete_rough.jpg\nMessyTextures/Concrete_Damage_Set/height_map_concrete.tif\nMessyTextures/Concrete_Damage_Set/Thumbs.db\nMessyTextures/Fabric_Pattern/pattern_01_diffuse.tga\nMessyTextures/Fabric_Pattern/pattern_01_ao.png\nMessyTextures/Fabric_Pattern/pattern_01_normal.png\nMessyTextures/Fabric_Pattern/notes.txt\nMessyTextures/Fabric_Pattern/variant_blue_diffuse.tga\nMessyTextures/Fabric_Pattern/fabric_flat.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_col.png",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_N.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_rough.jpg",
"classified_file_type": "MAP_ROUGH",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/height_map_concrete.tif",
"classified_file_type": "MAP_DISP",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/Thumbs.db",
"classified_file_type": "FILE_IGNORE",
"proposed_asset_group_name": null
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_diffuse.tga",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_ao.png",
"classified_file_type": "MAP_AO",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_normal.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/notes.txt",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/variant_blue_diffuse.tga",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/fabric_flat.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "Fabric_Pattern_01"
}
],
"asset_group_classifications": {
"Concrete_Damage_Set": "Surface",
"Fabric_Pattern_01": "Surface"
}
}
},
{
"input": "SciFi_Drone/Drone_Model.fbx\nSciFi_Drone/Textures/Drone_BaseColor.png\nSciFi_Drone/Textures/Drone_Metallic.png\nSciFi_Drone/Textures/Drone_Roughness.png\nSciFi_Drone/Textures/Drone_Normal.png\nSciFi_Drone/Textures/Drone_Emissive.jpg\nSciFi_Drone/ReferenceImages/concept.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "SciFi_Drone/Drone_Model.fbx",
"classified_file_type": "MODEL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_BaseColor.png",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Metallic.png",
"classified_file_type": "MAP_METAL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Roughness.png",
"classified_file_type": "MAP_ROUGH",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Normal.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Emissive.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/ReferenceImages/concept.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "SciFi_Drone"
}
],
"asset_group_classifications": {
"SciFi_Drone": "Model"
}
}
},
{
"input": "21_hairs_deposits.tif\n22_hairs_fabric.tif\n23_hairs_fibres.tif\n24_hairs_fibres.tif\n25_bonus_isolatedFingerprints.tif\n26_bonus_isolatedPalmprint.tif\n27_metal_aluminum.tif\n28_metal_castIron.tif\n29_scratcehes_deposits_shapes.tif\n30_scratches_deposits.tif",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "21_hairs_deposits.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Deposits_21"
},
{
"relative_file_path": "22_hairs_fabric.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fabric_22"
},
{
"relative_file_path": "23_hairs_fibres.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fibres_23"
},
{
"relative_file_path": "24_hairs_fibres.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fibres_24"
},
{
"relative_file_path": "25_bonus_isolatedFingerprints.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Bonus_IsolatedFingerprints_25"
},
{
"relative_file_path": "26_bonus_isolatedPalmprint.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Bonus_IsolatedPalmprint_26"
},
{
"relative_file_path": "27_metal_aluminum.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Metal_Aluminum_27"
},
{
"relative_file_path": "28_metal_castIron.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Metal_CastIron_28"
},
{
"relative_file_path": "29_scratcehes_deposits_shapes.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Scratches_Deposits_Shapes_29"
},
{
"relative_file_path": "30_scratches_deposits.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Scratches_Deposits_30"
}
],
"asset_group_classifications": {
"Hairs_Deposits_21": "UtilityMap",
"Hairs_Fabric_22": "UtilityMap",
"Hairs_Fibres_23": "UtilityMap",
"Hairs_Fibres_24": "UtilityMap",
"Bonus_IsolatedFingerprints_25": "UtilityMap",
"Bonus_IsolatedPalmprint_26": "UtilityMap",
"Metal_Aluminum_27": "UtilityMap",
"Metal_CastIron_28": "UtilityMap",
"Scratches_Deposits_Shapes_29": "UtilityMap",
"Scratches_Deposits_30": "UtilityMap"
}
}
},
{
"input": "Part1/TextureSupply_Boards001_A_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_A_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_B_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_B_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_C_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_C_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_D_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_D_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_E_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_E_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_F_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_F_28x300cm-Normal.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "Part1/TextureSupply_Boards001_A_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_A"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_A_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_A"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_B_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_B"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_B_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_B"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_C_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_C"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_C_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_C"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_D_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_D"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_D_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_D"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_E_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_E"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_E_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_E"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_F_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_F"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_F_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_F"
}
],
"asset_group_classifications": {
"Boards001_A": "Surface",
"Boards001_B": "Surface",
"Boards001_C": "Surface",
"Boards001_D": "Surface",
"Boards001_E": "Surface",
"Boards001_F": "Surface"
}
}
}
],
"llm_endpoint_url": "https://api.llm.gestaltservers.com/v1/chat/completions",
"llm_api_key": "",
"llm_model_name": "qwen2.5-coder:3b",
"llm_temperature": 0.5,
"llm_request_timeout": 120,
"llm_predictor_prompt": "You are an expert asset classification system. Your task is to analyze a list of file paths, understand their relationships based on naming and directory structure, and output a structured JSON object that classifies each file individually and then classifies the logical asset groups they belong to.\n\nDefinitions:\n\nAsset Types: These define the overall category of a logical asset group. Use one of the following keys when classifying asset groups:\njson\n{ASSET_TYPE_DEFINITIONS}\n\n\nFile Types: These define the specific purpose of each individual file. Use one of the following keys when classifying individual files:\njson\n{FILE_TYPE_DEFINITIONS}\n\n\nCore Task & Logic:\n\n1. **Individual File Analysis:**\n * Examine each `relative_file_path` in the input `FILE_LIST`.\n * For EACH file, determine its most likely `classified_file_type` using the `FILE_TYPE_DEFINITIONS`. Pay attention to filename suffixes, keywords, and extensions. Use `FILE_IGNORE` for files like `Thumbs.db` or `.DS_Store`. Use `EXTRA` for previews, metadata, or unidentifiable maps.\n * For EACH file, propose a logical `proposed_asset_group_name` (string). This name should represent the asset the file likely belongs to, based on common base names (e.g., `WoodFloor01` from `WoodFloor01_col.png`, `WoodFloor01_nrm.png`) or directory structure (e.g., `SciFi_Drone` for files within that folder).\n * Files that seem to be standalone utility maps (like `scratches.png`, `FlowMap.tif`) should get a unique group name derived from their filename (e.g., `Scratches`, `FlowMap`).\n * If a file doesn't seem to belong to any logical group (e.g., a stray readme file in the root), you can propose `null` or a generic name like `Miscellaneous`.\n * Be consistent with the proposed names for files belonging to the same logical asset.\n * Populate the `individual_file_analysis` array with one object for *every* file in the input list, containing `relative_file_path`, `classified_file_type`, and `proposed_asset_group_name`.\n\n2. **Asset Group Classification:**\n * Collect all unique, non-null `proposed_asset_group_name` values generated in the previous step.\n * For EACH unique group name, determine the overall `asset_type` (using `ASSET_TYPE_DEFINITIONS`) based on the types of files assigned to that group name in the `individual_file_analysis`.\n * Example: If files proposed as `AssetGroup1` include `MAP_COL`, `MAP_NRM`, `MAP_ROUGH`, classify `AssetGroup1` as `Surface`.\n * Example: If files proposed as `AssetGroup2` include `MODEL` and texture maps, classify `AssetGroup2` as `Model`.\n * Example: If `AssetGroup3` only has one file classified as `MAP_IMPERFECTION`, classify `AssetGroup3` as `UtilityMap`.\n * Populate the `asset_group_classifications` dictionary, mapping each unique `proposed_asset_group_name` to its determined `asset_type`.\n\nInput File List:\n\ntext\n{FILE_LIST}\n\n\nOutput Format:\n\nYour response MUST be ONLY a single JSON object. You MAY include comments (using // or /* */) within the JSON structure for clarification if needed, but the core structure must be valid JSON. Do NOT include any text, explanations, or introductory phrases before or after the JSON object itself. Ensure all strings are correctly quoted and escaped.\n\nCRITICAL: The output JSON structure must strictly adhere to the following format:\n\n```json\n{\n \"individual_file_analysis\": [\n {\n // Optional comment about this file\n \"relative_file_path\": \"string\", // Exact relative path from the input list\n \"classified_file_type\": \"string\", // Key from FILE_TYPE_DEFINITIONS\n \"proposed_asset_group_name\": \"string_or_null\" // Your suggested group name for this file\n }\n // ... one object for EVERY file in the input list\n ],\n \"asset_group_classifications\": {\n // Dictionary mapping unique proposed group names to asset types\n \"ProposedGroupName1\": \"string\", // Key: proposed_asset_group_name, Value: Key from ASSET_TYPE_DEFINITIONS\n \"ProposedGroupName2\": \"string\"\n // ... one entry for each unique, non-null proposed_asset_group_name\n }\n}\n```\n\nExamples:\n\nHere are examples of input file lists and the desired JSON output, illustrating the two-part structure:\n\njson\n[\n {EXAMPLE_INPUT_OUTPUT_PAIRS}\n]\n\n\nNow, process the provided FILE_LIST and generate ONLY the JSON output according to these instructions. Remember to include an entry in `individual_file_analysis` for every single input file path."
"TEMP_DIR_PREFIX": "_PROCESS_ASSET_"
}

265
config/llm_settings.json Normal file
View File

@ -0,0 +1,265 @@
{
"llm_predictor_examples": [
{
"input": "MessyTextures/Concrete_Damage_Set/concrete_col.png\nMessyTextures/Concrete_Damage_Set/concrete_N.png\nMessyTextures/Concrete_Damage_Set/concrete_rough.jpg\nMessyTextures/Concrete_Damage_Set/height_map_concrete.tif\nMessyTextures/Concrete_Damage_Set/Thumbs.db\nMessyTextures/Fabric_Pattern/pattern_01_diffuse.tga\nMessyTextures/Fabric_Pattern/pattern_01_ao.png\nMessyTextures/Fabric_Pattern/pattern_01_normal.png\nMessyTextures/Fabric_Pattern/notes.txt\nMessyTextures/Fabric_Pattern/variant_blue_diffuse.tga\nMessyTextures/Fabric_Pattern/fabric_flat.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_col.png",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_N.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/concrete_rough.jpg",
"classified_file_type": "MAP_ROUGH",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/height_map_concrete.tif",
"classified_file_type": "MAP_DISP",
"proposed_asset_group_name": "Concrete_Damage_Set"
},
{
"relative_file_path": "MessyTextures/Concrete_Damage_Set/Thumbs.db",
"classified_file_type": "FILE_IGNORE",
"proposed_asset_group_name": null
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_diffuse.tga",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_ao.png",
"classified_file_type": "MAP_AO",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/pattern_01_normal.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/notes.txt",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/variant_blue_diffuse.tga",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Fabric_Pattern_01"
},
{
"relative_file_path": "MessyTextures/Fabric_Pattern/fabric_flat.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "Fabric_Pattern_01"
}
],
"asset_group_classifications": {
"Concrete_Damage_Set": "Surface",
"Fabric_Pattern_01": "Surface"
}
}
},
{
"input": "SciFi_Drone/Drone_Model.fbx\nSciFi_Drone/Textures/Drone_BaseColor.png\nSciFi_Drone/Textures/Drone_Metallic.png\nSciFi_Drone/Textures/Drone_Roughness.png\nSciFi_Drone/Textures/Drone_Normal.png\nSciFi_Drone/Textures/Drone_Emissive.jpg\nSciFi_Drone/ReferenceImages/concept.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "SciFi_Drone/Drone_Model.fbx",
"classified_file_type": "MODEL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_BaseColor.png",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Metallic.png",
"classified_file_type": "MAP_METAL",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Roughness.png",
"classified_file_type": "MAP_ROUGH",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Normal.png",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/Textures/Drone_Emissive.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "SciFi_Drone"
},
{
"relative_file_path": "SciFi_Drone/ReferenceImages/concept.jpg",
"classified_file_type": "EXTRA",
"proposed_asset_group_name": "SciFi_Drone"
}
],
"asset_group_classifications": {
"SciFi_Drone": "Model"
}
}
},
{
"input": "21_hairs_deposits.tif\n22_hairs_fabric.tif\n23_hairs_fibres.tif\n24_hairs_fibres.tif\n25_bonus_isolatedFingerprints.tif\n26_bonus_isolatedPalmprint.tif\n27_metal_aluminum.tif\n28_metal_castIron.tif\n29_scratcehes_deposits_shapes.tif\n30_scratches_deposits.tif",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "21_hairs_deposits.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Deposits_21"
},
{
"relative_file_path": "22_hairs_fabric.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fabric_22"
},
{
"relative_file_path": "23_hairs_fibres.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fibres_23"
},
{
"relative_file_path": "24_hairs_fibres.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Hairs_Fibres_24"
},
{
"relative_file_path": "25_bonus_isolatedFingerprints.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Bonus_IsolatedFingerprints_25"
},
{
"relative_file_path": "26_bonus_isolatedPalmprint.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Bonus_IsolatedPalmprint_26"
},
{
"relative_file_path": "27_metal_aluminum.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Metal_Aluminum_27"
},
{
"relative_file_path": "28_metal_castIron.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Metal_CastIron_28"
},
{
"relative_file_path": "29_scratcehes_deposits_shapes.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Scratches_Deposits_Shapes_29"
},
{
"relative_file_path": "30_scratches_deposits.tif",
"classified_file_type": "MAP_IMPERFECTION",
"proposed_asset_group_name": "Scratches_Deposits_30"
}
],
"asset_group_classifications": {
"Hairs_Deposits_21": "UtilityMap",
"Hairs_Fabric_22": "UtilityMap",
"Hairs_Fibres_23": "UtilityMap",
"Hairs_Fibres_24": "UtilityMap",
"Bonus_IsolatedFingerprints_25": "UtilityMap",
"Bonus_IsolatedPalmprint_26": "UtilityMap",
"Metal_Aluminum_27": "UtilityMap",
"Metal_CastIron_28": "UtilityMap",
"Scratches_Deposits_Shapes_29": "UtilityMap",
"Scratches_Deposits_30": "UtilityMap"
}
}
},
{
"input": "Part1/TextureSupply_Boards001_A_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_A_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_B_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_B_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_C_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_C_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_D_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_D_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_E_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_E_28x300cm-Normal.jpg\nPart1/TextureSupply_Boards001_F_28x300cm-Albedo.jpg\nPart1/TextureSupply_Boards001_F_28x300cm-Normal.jpg",
"output": {
"individual_file_analysis": [
{
"relative_file_path": "Part1/TextureSupply_Boards001_A_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_A"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_A_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_A"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_B_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_B"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_B_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_B"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_C_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_C"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_C_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_C"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_D_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_D"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_D_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_D"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_E_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_E"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_E_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_E"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_F_28x300cm-Albedo.jpg",
"classified_file_type": "MAP_COL",
"proposed_asset_group_name": "Boards001_F"
},
{
"relative_file_path": "Part1/TextureSupply_Boards001_F_28x300cm-Normal.jpg",
"classified_file_type": "MAP_NRM",
"proposed_asset_group_name": "Boards001_F"
}
],
"asset_group_classifications": {
"Boards001_A": "Surface",
"Boards001_B": "Surface",
"Boards001_C": "Surface",
"Boards001_D": "Surface",
"Boards001_E": "Surface",
"Boards001_F": "Surface"
}
}
}
],
"llm_endpoint_url": "https://api.llm.gestaltservers.com/v1/chat/completions",
"llm_api_key": "",
"llm_model_name": "qwen2.5-coder:3b",
"llm_temperature": 0.5,
"llm_request_timeout": 120,
"llm_predictor_prompt": "You are an expert asset classification system. Your task is to analyze a list of file paths, understand their relationships based on naming and directory structure, and output a structured JSON object that classifies each file individually and then classifies the logical asset groups they belong to.\n\nDefinitions:\n\nAsset Types: These define the overall category of a logical asset group. Use one of the following keys when classifying asset groups:\njson\n{ASSET_TYPE_DEFINITIONS}\n\n\nFile Types: These define the specific purpose of each individual file. Use one of the following keys when classifying individual files:\njson\n{FILE_TYPE_DEFINITIONS}\n\n\nCore Task & Logic:\n\n1. **Individual File Analysis:**\n * Examine each `relative_file_path` in the input `FILE_LIST`.\n * For EACH file, determine its most likely `classified_file_type` using the `FILE_TYPE_DEFINITIONS`. Pay attention to filename suffixes, keywords, and extensions. Use `FILE_IGNORE` for files like `Thumbs.db` or `.DS_Store`. Use `EXTRA` for previews, metadata, or unidentifiable maps.\n * For EACH file, propose a logical `proposed_asset_group_name` (string). This name should represent the asset the file likely belongs to, based on common base names (e.g., `WoodFloor01` from `WoodFloor01_col.png`, `WoodFloor01_nrm.png`) or directory structure (e.g., `SciFi_Drone` for files within that folder).\n * Files that seem to be standalone utility maps (like `scratches.png`, `FlowMap.tif`) should get a unique group name derived from their filename (e.g., `Scratches`, `FlowMap`).\n * If a file doesn't seem to belong to any logical group (e.g., a stray readme file in the root), you can propose `null` or a generic name like `Miscellaneous`.\n * Be consistent with the proposed names for files belonging to the same logical asset.\n * Populate the `individual_file_analysis` array with one object for *every* file in the input list, containing `relative_file_path`, `classified_file_type`, and `proposed_asset_group_name`.\n\n2. **Asset Group Classification:**\n * Collect all unique, non-null `proposed_asset_group_name` values generated in the previous step.\n * For EACH unique group name, determine the overall `asset_type` (using `ASSET_TYPE_DEFINITIONS`) based on the types of files assigned to that group name in the `individual_file_analysis`.\n * Example: If files proposed as `AssetGroup1` include `MAP_COL`, `MAP_NRM`, `MAP_ROUGH`, classify `AssetGroup1` as `Surface`.\n * Example: If files proposed as `AssetGroup2` include `MODEL` and texture maps, classify `AssetGroup2` as `Model`.\n * Example: If `AssetGroup3` only has one file classified as `MAP_IMPERFECTION`, classify `AssetGroup3` as `UtilityMap`.\n * Populate the `asset_group_classifications` dictionary, mapping each unique `proposed_asset_group_name` to its determined `asset_type`.\n\nInput File List:\n\ntext\n{FILE_LIST}\n\n\nOutput Format:\n\nYour response MUST be ONLY a single JSON object. You MAY include comments (using // or /* */) within the JSON structure for clarification if needed, but the core structure must be valid JSON. Do NOT include any text, explanations, or introductory phrases before or after the JSON object itself. Ensure all strings are correctly quoted and escaped.\n\nCRITICAL: The output JSON structure must strictly adhere to the following format:\n\n```json\n{\n \"individual_file_analysis\": [\n {\n // Optional comment about this file\n \"relative_file_path\": \"string\", // Exact relative path from the input list\n \"classified_file_type\": \"string\", // Key from FILE_TYPE_DEFINITIONS\n \"proposed_asset_group_name\": \"string_or_null\" // Your suggested group name for this file\n }\n // ... one object for EVERY file in the input list\n ],\n \"asset_group_classifications\": {\n // Dictionary mapping unique proposed group names to asset types\n \"ProposedGroupName1\": \"string\", // Key: proposed_asset_group_name, Value: Key from ASSET_TYPE_DEFINITIONS\n \"ProposedGroupName2\": \"string\"\n // ... one entry for each unique, non-null proposed_asset_group_name\n }\n}\n```\n\nExamples:\n\nHere are examples of input file lists and the desired JSON output, illustrating the two-part structure:\n\njson\n[\n {EXAMPLE_INPUT_OUTPUT_PAIRS}\n]\n\n\nNow, process the provided FILE_LIST and generate ONLY the JSON output according to these instructions. Remember to include an entry in `individual_file_analysis` for every single input file path."
}

View File

@ -13,6 +13,7 @@ log = logging.getLogger(__name__) # Use logger defined in main.py
# Assumes config/ and presets/ are relative to this file's location
BASE_DIR = Path(__file__).parent
APP_SETTINGS_PATH = BASE_DIR / "config" / "app_settings.json"
LLM_SETTINGS_PATH = BASE_DIR / "config" / "llm_settings.json" # Added LLM settings path
PRESETS_DIR = BASE_DIR / "Presets"
# --- Custom Exception ---
@ -89,6 +90,7 @@ class Configuration:
log.debug(f"Initializing Configuration with preset: '{preset_name}'")
self.preset_name = preset_name
self._core_settings: dict = self._load_core_config()
self._llm_settings: dict = self._load_llm_config() # Load LLM settings
self._preset_settings: dict = self._load_preset(preset_name)
self._validate_configs()
self._compile_regex_patterns() # Compile regex after validation
@ -209,6 +211,26 @@ class Configuration:
except Exception as e:
raise ConfigurationError(f"Failed to read core configuration file {APP_SETTINGS_PATH}: {e}")
def _load_llm_config(self) -> dict:
"""Loads settings from the llm_settings.json file."""
log.debug(f"Loading LLM config from: {LLM_SETTINGS_PATH}")
if not LLM_SETTINGS_PATH.is_file():
# Log a warning but don't raise an error, allow fallback if possible
log.warning(f"LLM configuration file not found: {LLM_SETTINGS_PATH}. LLM features might be disabled or use defaults.")
return {} # Return empty dict if file not found
try:
with open(LLM_SETTINGS_PATH, 'r', encoding='utf-8') as f:
settings = json.load(f)
log.debug(f"LLM config loaded successfully.")
return settings
except json.JSONDecodeError as e:
log.error(f"Failed to parse LLM configuration file {LLM_SETTINGS_PATH}: Invalid JSON - {e}")
return {} # Return empty dict on parse error
except Exception as e:
log.error(f"Failed to read LLM configuration file {LLM_SETTINGS_PATH}: {e}")
return {} # Return empty dict on other read errors
def _load_preset(self, preset_name: str) -> dict:
"""Loads the specified preset JSON file."""
log.debug(f"Loading preset: '{preset_name}' from {PRESETS_DIR}")
@ -263,8 +285,22 @@ class Configuration:
raise ConfigurationError("Core config 'IMAGE_RESOLUTIONS' must be a dictionary.")
if not isinstance(self._core_settings.get('STANDARD_MAP_TYPES'), list):
raise ConfigurationError("Core config 'STANDARD_MAP_TYPES' must be a list.")
# LLM settings validation (check if keys exist if the file was loaded)
if self._llm_settings: # Only validate if LLM settings were loaded
required_llm_keys = [ # Indent this block
"llm_predictor_examples", "llm_endpoint_url", "llm_api_key",
"llm_model_name", "llm_temperature", "llm_request_timeout",
"llm_predictor_prompt"
]
for key in required_llm_keys: # Indent this block
if key not in self._llm_settings: # Indent this block
# Log warning instead of raising error to allow partial functionality
log.warning(f"LLM config is missing recommended key: '{key}'. LLM features might not work correctly.") # Indent this block
# raise ConfigurationError(f"LLM config is missing required key: '{key}'.") # Indent this block
# Add more checks as necessary
log.debug("Configuration validation passed.")
log.debug("Configuration validation passed.") # Keep this alignment
# --- Accessor Methods/Properties ---
@ -409,13 +445,40 @@ class Configuration:
return list(self.get_file_type_definitions_with_examples().keys())
def get_llm_examples(self) -> list:
"""Returns the list of LLM input/output examples from core settings."""
return self._core_settings.get('llm_predictor_examples', [])
"""Returns the list of LLM input/output examples from LLM settings."""
# Use empty list as fallback if LLM settings file is missing/invalid
return self._llm_settings.get('llm_predictor_examples', [])
@property
def llm_predictor_prompt(self) -> str:
"""Returns the LLM predictor prompt string from LLM settings."""
return self._llm_settings.get('llm_predictor_prompt', '') # Fallback to empty string
@property
def llm_endpoint_url(self) -> str:
"""Returns the LLM endpoint URL from LLM settings."""
return self._llm_settings.get('llm_endpoint_url', '')
@property
def llm_api_key(self) -> str:
"""Returns the LLM API key from LLM settings."""
return self._llm_settings.get('llm_api_key', '')
@property
def llm_model_name(self) -> str:
"""Returns the LLM model name from LLM settings."""
return self._llm_settings.get('llm_model_name', '')
@property
def llm_temperature(self) -> float:
"""Returns the LLM temperature from LLM settings."""
return self._llm_settings.get('llm_temperature', 0.5) # Default temperature
@property
def llm_request_timeout(self) -> int:
"""Returns the LLM request timeout in seconds from LLM settings."""
return self._llm_settings.get('llm_request_timeout', 120) # Default timeout
def get_setting(self, key: str, default: any = None) -> any:
"""Gets a specific setting by key from the core settings."""
# Note: This accesses _core_settings directly, not combined/preset settings.
return self._core_settings.get(key, default)
# --- Standalone Base Config Functions ---
def load_base_config() -> dict:

View File

@ -7,7 +7,8 @@ from PySide6.QtCore import QObject, Signal, QThread, Slot, QTimer
# --- Backend Imports ---
# Assuming these might be needed based on MainWindow's usage
try:
from configuration import Configuration, ConfigurationError, load_base_config
# Removed load_base_config import
from configuration import Configuration, ConfigurationError
from .llm_prediction_handler import LLMPredictionHandler # Backend handler
from rule_structure import SourceRule # For signal emission type hint
except ImportError as e:
@ -16,6 +17,7 @@ except ImportError as e:
load_base_config = None
ConfigurationError = Exception
SourceRule = None # Define as None if import fails
Configuration = None # Define as None if import fails
log = logging.getLogger(__name__)
@ -114,16 +116,6 @@ class LLMInteractionHandler(QObject):
# Extract file list if not provided (needed for re-interpretation calls)
if file_list is None:
log.debug(f"File list not provided for {input_path_str}, extracting...")
# Need access to MainWindow's _extract_file_list or reimplement
# For now, assume MainWindow provides it or pass it during queueing
# Let's assume file_list is always provided correctly for now.
# If extraction fails before queueing, it won't reach here.
# If extraction needs to happen here, MainWindow ref is needed.
# Re-evaluating: MainWindow._extract_file_list is complex.
# It's better if the caller (MainWindow) extracts and passes the list.
# We'll modify queue_llm_request to require a non-None list eventually,
# or pass the main_window ref to call its extraction method.
# Let's pass main_window ref for now.
if hasattr(self.main_window, '_extract_file_list'):
file_list = self.main_window._extract_file_list(input_path_str)
if file_list is None:
@ -131,11 +123,6 @@ class LLMInteractionHandler(QObject):
log.error(error_msg)
self.llm_status_update.emit(f"Error extracting files for {os.path.basename(input_path_str)}")
self.llm_prediction_error.emit(input_path_str, error_msg) # Signal error
# If called as part of a queue, we need to ensure the next item is processed.
# _reset_llm_thread_references handles this via the finished signal,
# but if the thread never starts, we need to trigger manually.
# This case should ideally be caught before calling _start_llm_prediction.
# We'll assume the queue logic handles failed extraction before calling this.
return # Stop if extraction failed
else:
error_msg = f"MainWindow reference does not have _extract_file_list method."
@ -153,52 +140,27 @@ class LLMInteractionHandler(QObject):
self.llm_prediction_error.emit(input_path_str, error_msg)
return
# --- Load Base Config for LLM Settings ---
if load_base_config is None:
log.critical("LLM Error: load_base_config function not available.")
self.llm_status_update.emit("LLM Error: Cannot load base configuration.")
self.llm_prediction_error.emit(input_path_str, "load_base_config function not available.")
# --- Get Configuration Object ---
if not hasattr(self.main_window, 'config') or not isinstance(self.main_window.config, Configuration):
error_msg = "LLM Error: Main window does not have a valid Configuration object."
log.critical(error_msg)
self.llm_status_update.emit("LLM Error: Cannot access application configuration.")
self.llm_prediction_error.emit(input_path_str, error_msg)
return
try:
base_config = load_base_config()
if not base_config:
raise ConfigurationError("Failed to load base configuration (app_settings.json).")
llm_settings = {
"llm_endpoint_url": base_config.get('llm_endpoint_url'),
"api_key": base_config.get('llm_api_key'),
"model_name": base_config.get('llm_model_name', 'gemini-pro'),
"prompt_template_content": base_config.get('llm_predictor_prompt'),
"asset_types": base_config.get('ASSET_TYPE_DEFINITIONS', {}),
"file_types": base_config.get('FILE_TYPE_DEFINITIONS', {}),
"examples": base_config.get('llm_predictor_examples', [])
}
except ConfigurationError as e:
log.error(f"LLM Configuration Error: {e}")
self.llm_status_update.emit(f"LLM Config Error: {e}")
self.llm_prediction_error.emit(input_path_str, f"LLM Configuration Error: {e}")
# Optionally show a QMessageBox via main_window ref if critical
# self.main_window.show_critical_error("LLM Config Error", str(e))
return
except Exception as e:
log.exception(f"Unexpected error loading LLM configuration: {e}")
self.llm_status_update.emit(f"LLM Config Error: {e}")
self.llm_prediction_error.emit(input_path_str, f"Unexpected error loading LLM config: {e}")
return
# --- End Config Loading ---
config = self.main_window.config # Get the config object
# --- Check if Handler Class is Available ---
if LLMPredictionHandler is None:
log.critical("LLMPredictionHandler class not available.")
self.llm_status_update.emit("LLM Error: Prediction handler component missing.")
self.llm_prediction_error.emit(input_path_str, "LLMPredictionHandler class not available.")
return
# Clean up previous thread/handler if any exist (should not happen if queue logic is correct)
# --- Clean up previous thread/handler if necessary ---
if self.llm_prediction_thread and self.llm_prediction_thread.isRunning():
log.warning("Warning: Previous LLM prediction thread still running when trying to start new one. This indicates a potential logic error.")
# Attempt graceful shutdown (might need more robust handling)
log.warning("Warning: Previous LLM prediction thread still running when trying to start new one. Attempting cleanup.")
if self.llm_prediction_handler:
# Assuming LLMPredictionHandler has a cancel method or similar
if hasattr(self.llm_prediction_handler, 'cancel'):
self.llm_prediction_handler.cancel()
self.llm_prediction_thread.quit()
@ -206,7 +168,6 @@ class LLMInteractionHandler(QObject):
log.warning("LLM thread did not quit gracefully. Forcing termination.")
self.llm_prediction_thread.terminate()
self.llm_prediction_thread.wait() # Wait after terminate
# Reset references after ensuring termination
self.llm_prediction_thread = None
self.llm_prediction_handler = None
@ -214,8 +175,10 @@ class LLMInteractionHandler(QObject):
log.info(f"Starting LLM prediction thread for source: {input_path_str} with {len(file_list)} files.")
self.llm_status_update.emit(f"Starting LLM interpretation for {input_path_obj.name}...")
self.llm_prediction_thread = QThread(self.main_window) # Parent thread to main window's thread? Or self? Let's try self.
self.llm_prediction_handler = LLMPredictionHandler(input_path_str, file_list, llm_settings)
# --- Create Thread and Handler ---
self.llm_prediction_thread = QThread(self) # Parent thread to self
# Pass the Configuration object directly
self.llm_prediction_handler = LLMPredictionHandler(input_path_str, file_list, config)
self.llm_prediction_handler.moveToThread(self.llm_prediction_thread)
# Connect signals from handler to *internal* slots or directly emit signals

View File

@ -14,8 +14,8 @@ from rule_structure import SourceRule, AssetRule, FileRule # Ensure AssetRule an
# Assuming configuration loads app_settings.json
# Adjust the import path if necessary
# Removed Configuration import, will use load_base_config if needed or passed settings
# from configuration import Configuration
# Import Configuration class
from configuration import Configuration
# from configuration import load_base_config # No longer needed here
from .base_prediction_handler import BasePredictionHandler # Import base class
@ -28,7 +28,7 @@ class LLMPredictionHandler(BasePredictionHandler):
"""
# Signals (prediction_ready, prediction_error, status_update) are inherited
def __init__(self, input_source_identifier: str, file_list: list, llm_settings: dict, parent: QObject = None):
def __init__(self, input_source_identifier: str, file_list: list, config: Configuration, parent: QObject = None):
"""
Initializes the LLM handler.
@ -36,16 +36,15 @@ class LLMPredictionHandler(BasePredictionHandler):
input_source_identifier: The unique identifier for the input source (e.g., file path).
file_list: A list of *relative* file paths extracted from the input source.
(LLM expects relative paths based on the prompt template).
llm_settings: A dictionary containing necessary LLM configuration
(endpoint_url, api_key, prompt_template_content, etc.).
config: The loaded Configuration object containing all settings.
parent: The parent QObject.
"""
super().__init__(input_source_identifier, parent)
# input_source_identifier is stored by the base class as self.input_source_identifier
self.file_list = file_list # Store the provided relative file list
self.llm_settings = llm_settings # Store the settings dictionary
self.endpoint_url = self.llm_settings.get('llm_endpoint_url')
self.api_key = self.llm_settings.get('llm_api_key')
self.config = config # Store the Configuration object
# Access LLM settings via self.config properties when needed
# e.g., self.config.llm_endpoint_url, self.config.llm_api_key
# _is_running and _is_cancelled are handled by the base class
# The run() and cancel() slots are provided by the base class.
@ -128,28 +127,17 @@ class LLMPredictionHandler(BasePredictionHandler):
"""
Prepares the full prompt string to send to the LLM using stored settings.
"""
# Access settings from the stored dictionary
prompt_template = self.llm_settings.get('prompt_template_content')
# Access settings via the Configuration object
prompt_template = self.config.llm_predictor_prompt
if not prompt_template:
# Attempt to fall back to reading the default file path if content is missing
default_template_path = 'llm_prototype/prompt_template.txt'
print(f"Warning: 'prompt_template_content' missing in llm_settings. Falling back to reading default file: {default_template_path}")
try:
with open(default_template_path, 'r', encoding='utf-8') as f:
prompt_template = f.read()
except FileNotFoundError:
raise ValueError(f"LLM predictor prompt template content missing in settings and default file not found at: {default_template_path}")
except Exception as e:
raise ValueError(f"Error reading default LLM prompt template file {default_template_path}: {e}")
if not prompt_template: # Final check after potential fallback
raise ValueError("LLM predictor prompt template content is empty or could not be loaded.")
# Config object should handle defaults or raise error during init if critical prompt is missing
raise ValueError("LLM predictor prompt template content is empty or could not be loaded from configuration.")
# Access definitions and examples from the settings dictionary
asset_defs = json.dumps(self.llm_settings.get('asset_types', {}), indent=4)
file_defs = json.dumps(self.llm_settings.get('file_types', {}), indent=4)
examples = json.dumps(self.llm_settings.get('examples', []), indent=2)
# Access definitions and examples via Configuration object methods/properties
asset_defs = json.dumps(self.config.get_asset_type_definitions(), indent=4)
file_defs = json.dumps(self.config.get_file_type_definitions_with_examples(), indent=4)
examples = json.dumps(self.config.get_llm_examples(), indent=2)
# Format *relative* file list as a single string with newlines
file_list_str = "\n".join(relative_file_list)
@ -177,32 +165,34 @@ class LLMPredictionHandler(BasePredictionHandler):
ValueError: If the endpoint URL is not configured or the response is invalid.
requests.exceptions.RequestException: For other request-related errors.
"""
if not self.endpoint_url:
endpoint_url = self.config.llm_endpoint_url # Get from config
if not endpoint_url:
raise ValueError("LLM endpoint URL is not configured in settings.")
headers = {
"Content-Type": "application/json",
}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
api_key = self.config.llm_api_key # Get from config
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
# Construct payload based on OpenAI Chat Completions format
payload = {
# Use configured model name, default to 'local-model'
"model": self.llm_settings.get("llm_model_name", "local-model"),
"model": self.config.llm_model_name or "local-model", # Use config property, fallback
"messages": [{"role": "user", "content": prompt}],
# Use configured temperature, default to 0.5
"temperature": self.llm_settings.get("llm_temperature", 0.5),
"temperature": self.config.llm_temperature, # Use config property (has default)
# Add max_tokens if needed/configurable:
# "max_tokens": self.llm_settings.get("llm_max_tokens", 1024),
# "max_tokens": self.config.llm_max_tokens, # Example if added to config
# Ensure the LLM is instructed to return JSON in the prompt itself
# Some models/endpoints support a specific json mode:
# "response_format": { "type": "json_object" } # If supported by endpoint
}
# Status update emitted by _perform_prediction before calling this
# self.status_update.emit(f"Sending request to LLM at {self.endpoint_url}...")
print(f"--- Calling LLM API: {self.endpoint_url} ---")
# self.status_update.emit(f"Sending request to LLM at {endpoint_url}...")
print(f"--- Calling LLM API: {endpoint_url} ---")
# print(f"--- Payload Preview ---\n{json.dumps(payload, indent=2)[:500]}...\n--- END Payload Preview ---")
# Note: Exceptions raised here (Timeout, RequestException, ValueError)
@ -210,10 +200,10 @@ class LLMPredictionHandler(BasePredictionHandler):
# Make the POST request with a timeout
response = requests.post(
self.endpoint_url,
endpoint_url,
headers=headers,
json=payload,
timeout=self.llm_settings.get("llm_request_timeout", 120)
timeout=self.config.llm_request_timeout # Use config property (has default)
)
response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
@ -328,8 +318,8 @@ class LLMPredictionHandler(BasePredictionHandler):
# --- Prepare for Rule Creation ---
source_rule = SourceRule(input_path=self.input_source_identifier)
valid_asset_types = list(self.llm_settings.get('asset_types', {}).keys())
valid_file_types = list(self.llm_settings.get('file_types', {}).keys())
valid_asset_types = self.config.get_asset_type_keys() # Use config method
valid_file_types = self.config.get_file_type_keys() # Use config method
asset_rules_map: Dict[str, AssetRule] = {} # Maps group_name to AssetRule
# --- Process Individual Files and Build Rules ---