5.4 KiB
Developer Guide: LLM Integration Progress
This document summarizes the goals, approach, and current progress on integrating Large Language Model (LLM) capabilities into the Asset Processor Tool for handling irregularly named asset inputs.
1. Initial Goal
The primary goal is to enhance the Asset Processor Tool's ability to process asset sources with irregular or non-standard naming conventions that cannot be reliably handled by the existing regex and keyword-based preset system. This involves leveraging an LLM to interpret lists of filenames and determine asset metadata and file classifications.
2. Agreed Approach
After initial discussion and exploring several options, the agreed approach for developing this feature is as follows:
- Dedicated LLM Preset: The LLM classification logic will be triggered by selecting a specific preset type (or flag) in the main tool, indicating that standard rule-based processing should be bypassed in favor of the LLM.
- Standalone Prototype: The core LLM interaction and classification logic is being developed as a standalone Python prototype within the
llm_prototype/directory. This allows for focused development, testing, and refinement in isolation before integration into the main application. - Configurable LLM Endpoint: The prototype is designed to allow users to configure the LLM API endpoint, supporting various providers including local LLMs (e.g., via LM Studio) and commercial APIs. API keys are handled via environment variables for security.
- Multi-Asset Handling: The prototype is being built to handle input sources that contain multiple distinct assets within a single directory or archive. The LLM is expected to identify these separate assets and return a JSON list, where each item in the list represents one asset.
- Chain of Thought (CoT) Prompting: To improve the LLM's ability to handle the complex task of identifying multiple assets and classifying files, the prompt includes instructions for the LLM to output its reasoning process within tags before generating the final JSON list.
- Unified Asset Category: The asset classification uses a single
asset_categoryfield with defined valid values:Model,Surface,Decal,ATLAS,Imperfection. - Robust JSON Extraction & Validation: The prototype includes logic to extract the JSON list from the LLM's response (handling potential extra text) and validate its structure and content against expected schemas and values.
3. Prototype Development Progress
The initial structure for the standalone prototype has been created in the llm_prototype/ directory:
llm_prototype/PLAN.md: This document outlines the detailed plan.llm_prototype/config_llm.py: Configuration file for LLM settings, expected values, and placeholders.llm_prototype/llm_classifier.py: Main script containing the core logic (loading config/input/prompt, formatting prompt, calling LLM API, extracting/validating JSON).llm_prototype/requirements_llm.txt: Lists therequestslibrary dependency.llm_prototype/prompt_template.txt: Contains the Chain of Thought prompt template with placeholders and few-shot examples.llm_prototype/README.md: Provides setup and running instructions for the prototype.llm_prototype/test_inputs/: Contains example input JSON files (dinesen_example.json,imperfections_example.json) representing file lists from asset sources.
Code has been added to llm_classifier.py for loading inputs/config/prompt, formatting the prompt, calling the API, and extracting/validating the JSON response. The JSON extraction logic has been made more robust to handle potential variations in LLM output format.
4. Current Status and Challenges
Initial testing of the prototype revealed the following:
- Successful communication with the configured LLM API endpoint.
- The LLM is attempting to follow the Chain of Thought structure and generate the list-based JSON output.
- Challenge: The LLM is currently failing to consistently produce complete and valid JSON output, leading to JSON decoding errors in the prototype script.
- Challenge: The LLM is not strictly adhering to the specified classification values (e.g., returning "Map" instead of "PBRMap"), despite the prompt explicitly listing the allowed values and including few-shot examples.
To address these challenges, few-shot examples demonstrating the expected JSON structure and exact classification values were added to the prompt_template.txt. The JSON extraction logic in llm_classifier.py was also updated to be more resilient.
5. Next Steps
The immediate next steps are focused on debugging and improving the LLM's output reliability:
- Continue testing the prototype with the updated
prompt_template.txt(including examples) using the example input files. - Analyze the terminal output to determine if the few-shot examples and improved extraction logic have resolved the JSON completeness and classification value issues.
- Based on the results, iterate on the prompt template (e.g., further emphasizing strict adherence to output format and values) and/or the JSON extraction/validation logic in
llm_classifier.pyas needed. - Repeat testing and iteration until the prototype reliably produces valid JSON output with correct classifications for the test cases.
Once the prototype demonstrates reliable classification, we can proceed to evaluate its performance and plan the integration into the main Asset Processor Tool.