Dedicated LLM settings file - UNTESTED!

2025-05-04 13:24:10 +02:00
parent 01c8f68ea0
commit 336d698f9b
7 changed files with 403 additions and 381 deletions
@@ -6,10 +6,11 @@ This document provides technical details about the configuration system and the

 The tool utilizes a two-tiered configuration system managed by the `configuration.py` module:

-1.  **Application Settings (`config/app_settings.json`):** This JSON file defines the core global default settings, constants, and rules that apply generally across different asset sources. Examples include default output paths, standard image resolutions, map merge rules, output format rules, Blender executable paths, and default map types. It also centrally defines metadata for allowed asset and file types. Key sections include `FILE_TYPE_DEFINITIONS`, `ASSET_TYPE_DEFINITIONS`, and `MAP_MERGE_RULES`.
-2.  **Preset Files (`Presets/*.json`):** These JSON files define supplier-specific rules and overrides. They contain patterns (often regular expressions) to interpret filenames, classify map types, handle variants, define naming conventions, and specify other source-specific behaviors.
+1.  **Application Settings (`config/app_settings.json`):** This JSON file defines the core global default settings, constants, and rules that apply generally across different asset sources (e.g., default output paths, standard image resolutions, map merge rules, output format rules, Blender paths, `FILE_TYPE_DEFINITIONS`, `ASSET_TYPE_DEFINITIONS`). **LLM-specific settings are now located in `config/llm_settings.json`.**
+2.  **LLM Settings (`config/llm_settings.json`):** This JSON file contains settings specifically related to the LLM predictor, such as the API endpoint, model name, prompt template, and examples.
+3.  **Preset Files (`Presets/*.json`):** These JSON files define supplier-specific rules and overrides. They contain patterns to interpret filenames, classify map types, handle variants, define naming conventions, and specify other source-specific behaviors.

-The `configuration.py` module is responsible for loading the base settings from `config/app_settings.json` (including loading and saving the JSON content), merging them with the rules from the selected preset file, and providing the base configuration via the `load_base_config()` function. Preset values generally override core settings where applicable. Note that the old `config.py` file has been deleted.
+The `configuration.py` module is responsible for loading the base settings from `config/app_settings.json`, the LLM settings from `config/llm_settings.json`, merging the base settings with the rules from the selected preset file, and providing access to all settings via the `Configuration` class. The `load_base_config()` function is still available for accessing only the `app_settings.json` content directly (e.g., for the GUI editor). Preset values generally override core settings where applicable. Note that the old `config.py` file has been deleted.

 ## Supplier Management (`config/suppliers.json`)

@@ -24,12 +25,13 @@ The `Configuration` class is central to the new configuration system. It is resp

 *   **Initialization:** An instance is created with a specific `preset_name`.
 *   **Loading:**
-    *   It first loads the base application settings from `config/app_settings.json`. This file now also contains the LLM-specific settings (`llm_endpoint_url`, `llm_api_key`, `llm_model_name`, `llm_temperature`, `llm_request_timeout`, `llm_predictor_prompt`, `llm_predictor_examples`).
-    *   It then loads the specified preset JSON file from the `Presets/` directory.
-*   **Merging:** The loaded settings from `app_settings.json` and the preset rules are merged into a single configuration object accessible via instance attributes. Preset values generally override the base settings from `app_settings.json` where applicable.
-*   **Validation (`_validate_configs`):** Performs basic structural validation on the loaded settings, checking for the presence of required keys and basic data types (e.g., ensuring `map_type_mapping` is a list of dictionaries).
-*   **Regex Compilation (`_compile_regex_patterns`):** A crucial step for performance. It iterates through the regex patterns defined in the merged configuration (from both `app_settings.json` and the preset) and compiles them using `re.compile` (mostly case-insensitive). These compiled regex objects are stored as instance attributes (e.g., `self.compiled_map_keyword_regex`) for fast matching during file classification. It uses a helper (`_fnmatch_to_regex`) for basic wildcard (`*`, `?`) conversion in patterns.
-*   **LLM Settings Access:** The `Configuration` class provides getter methods (e.g., `get_llm_endpoint_url()`, `get_llm_api_key()`, `get_llm_model_name()`, `get_llm_temperature()`, `get_llm_request_timeout()`, `get_llm_predictor_prompt()`, `get_llm_predictor_examples()`) to allow components like the `LLMPredictionHandler` to easily access the necessary LLM configuration values loaded from `app_settings.json`.
+    *   It first loads the base application settings from `config/app_settings.json`.
+    *   It then loads the LLM-specific settings from `config/llm_settings.json`.
+    *   Finally, it loads the specified preset JSON file from the `Presets/` directory.
+*   **Merging & Access:** The base settings from `app_settings.json` are merged with the preset rules. LLM settings are stored separately. All settings are accessible via instance properties (e.g., `config.target_filename_pattern`, `config.llm_endpoint_url`). Preset values generally override the base settings where applicable.
+*   **Validation (`_validate_configs`):** Performs basic structural validation on the loaded settings (base, LLM, and preset), checking for the presence of required keys and basic data types. Logs warnings for missing optional LLM keys.
+*   **Regex Compilation (`_compile_regex_patterns`):** Compiles regex patterns defined in the merged configuration (from base settings and the preset) for performance. Compiled regex objects are stored as instance attributes (e.g., `self.compiled_map_keyword_regex`).
+*   **LLM Settings Access:** The `Configuration` class provides direct property access (e.g., `config.llm_endpoint_url`, `config.llm_api_key`, `config.llm_model_name`, `config.llm_temperature`, `config.llm_request_timeout`, `config.llm_predictor_prompt`, `config.get_llm_examples()`) to allow components like the `LLMPredictionHandler` to easily access the necessary LLM configuration values loaded from `config/llm_settings.json`.

 An instance of `Configuration` is created within each worker process (`main.process_single_asset_wrapper`) to ensure that each concurrently processed asset uses the correct, isolated configuration based on the specified preset and the base application settings.

@@ -6,7 +6,7 @@ The LLM Predictor feature provides an alternative method for classifying asset t

 ## Configuration

-The LLM Predictor is configured via new settings in the `config/app_settings.json` file. These settings control the behavior of the LLM interaction:
+The LLM Predictor is configured via settings in the dedicated `config/llm_settings.json` file. These settings control the behavior of the LLM interaction:

 -   `llm_predictor_prompt`: The template for the prompt sent to the LLM. This prompt should guide the LLM to classify the asset based on its name and potentially other context. It can include placeholders that will be replaced with actual data during processing.
 -   `llm_endpoint_url`: The URL of the LLM API endpoint.
@@ -16,7 +16,7 @@ The LLM Predictor is configured via new settings in the `config/app_settings.jso
 -   `llm_request_timeout`: The maximum time (in seconds) to wait for a response from the LLM API.
 -   `llm_predictor_examples`: A list of example input/output pairs to include in the prompt for few-shot learning, helping the LLM understand the desired output format and classification logic.

-The prompt structure is crucial for effective classification. It should clearly instruct the LLM on the task and the expected output format. Placeholders within the prompt template (e.g., `{asset_name}`) are dynamically replaced with relevant data before the request is sent.
+These settings are loaded by the `Configuration` class (from `configuration.py`) along with the core `app_settings.json` and the selected preset. The prompt structure is crucial for effective classification. It should clearly instruct the LLM on the task and the expected output format. Placeholders within the prompt template (e.g., `{FILE_LIST}`) are dynamically replaced with relevant data before the request is sent.

 ## Expected LLM Output Format (Refactored)

@@ -64,9 +64,11 @@ The `gui/llm_prediction_handler.py` module contains the `LLMPredictionHandler` c

 Key Responsibilities & Methods:

-   **Initialization**: Takes the source identifier, file list, and `Configuration` object.
-   **`run()`**: The main method executed by the thread pool. It prepares the prompt, calls the LLM (via `LLMInteractionHandler`), parses the response, and emits the result or error.
-   **Interaction**: Uses `LLMInteractionHandler` to handle the actual prompt construction and API communication (details in `03_Key_Components.md` and `llm_interaction_handler.py`).
+-   **Initialization**: Takes the source identifier, file list, and the main `Configuration` object (which has loaded settings from `app_settings.json`, `llm_settings.json`, and the active preset).
+-   **`run()`**: The main method executed by the thread pool. It prepares the prompt, calls the LLM, parses the response, and emits the result or error.
+-   **Prompt Preparation (`_prepare_prompt`)**: Uses the `Configuration` object (`self.config`) to access the `llm_predictor_prompt`, `asset_type_definitions`, `file_type_definitions`, and `llm_examples` to build the final prompt string.
+-   **API Call (`_call_llm`)**: Uses the `Configuration` object (`self.config`) to get the `llm_endpoint_url`, `llm_api_key`, `llm_model_name`, `llm_temperature`, and `llm_request_timeout` to make the API request.
+-   **Parsing (`_parse_llm_response`)**: Parses the LLM's JSON response (using `self.config` again to get valid asset/file types for validation) and constructs the `SourceRule` hierarchy.
 -   **`_parse_llm_response(response_text)`**: This method contains the **new parsing logic**:
    1.  **Sanitization**: Removes common non-JSON elements like comments (`//`, `/* */`) and markdown code fences (```json ... ```) from the raw `response_text` to increase the likelihood of successful JSON parsing.
    2.  **JSON Parsing**: Parses the sanitized string into a Python dictionary.