Asset-Frameworker/ProjectNotes/Architectureplan_token-data.md

150 lines
7.6 KiB
Markdown

Implementation Plan: Path Token Data Generation
This plan outlines the steps required to implement data generation/retrieval for the [IncrementingValue], ####, and [Sha5] path tokens used in OUTPUT_DIRECTORY_PATTERN and OUTPUT_FILENAME_PATTERN.
1. Goal Recap
Enable the use of [IncrementingValue] (or ####), [Time], and [Sha5] tokens within the output path patterns used by processing_engine.py. Implement logic to generate/retrieve data for these tokens and pass it to utils.path_utils.generate_path_from_pattern. Confirm handling of [Date] and [ApplicationPath].
2. Analysis Summary & Existing Token Handling
[Date], [Time], [ApplicationPath]: Handled automatically by utils/path_utils.py. No changes needed.
[IncrementingValue] / ####: Requires data provision based on scanning existing output directories. Implementation detailed below.
[Sha5]: Requires data provision (first 5 chars of SHA-256 hash of original input file). Implementation detailed below.
Path Generation Points: _save_image() and _generate_metadata_file() in processing_engine.py.
3. Implementation Plan per Token
3.1. [IncrementingValue] / #### (Directory Scan Logic)
Scope & Behavior: Determine the next available incrementing number by scanning existing directories in the final output_base_path that match the OUTPUT_DIRECTORY_PATTERN structure. The value represents the next sequence number globally across the pattern structure.
Location: New utility function get_next_incrementing_value in utils/path_utils.py, called from orchestrating code (main.py / monitor.py).
Mechanism:
get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
Parses output_directory_pattern to find the incrementing token (#### or [IncrementingValue]) and determine padding digits.
Constructs a glob pattern based on the pattern structure (e.g., [0-9][0-9]_* for ##_*).
Uses output_base_path.glob() to find matching directories.
Extracts numerical prefixes from matching directory names using regex.
Finds the maximum existing integer value (or -1 if none).
Calculates next_value = max_value + 1.
Formats next_value as a zero-padded string based on the pattern's digits.
Returns the formatted string.
Orchestrator (main.py/monitor.py):
Load Configuration to get OUTPUT_DIRECTORY_PATTERN.
Get output_base_path.
Call next_increment_str = get_next_incrementing_value(output_base_path, config.output_directory_pattern).
Pass next_increment_str to ProcessingEngine.process as incrementing_value.
Integration (processing_engine.py):
Accept incrementing_value: Optional[str] in process signature.
Store on self.current_incrementing_value.
Add to token_data (key: 'incrementingvalue') in _save_image and _generate_metadata_file.
3.2. [Sha5]
Scope & Behavior: Calculate SHA-256 hash of the original input source file, take the first 5 characters.
Location: Orchestrating code (main.py / monitor.py) before ProcessingEngine invocation.
Mechanism: Use new utility function calculate_sha256 in utils/hash_utils.py. Call this in the orchestrator, get the first 5 chars, pass to ProcessingEngine.process.
Integration (processing_engine.py): Accept sha5_value: Optional[str] in process, store on self.current_sha5_value, add to token_data (key: 'sha5') in _save_image and _generate_metadata_file.
4. Proposed Code Changes
4.1. utils/hash_utils.py (New File)
# utils/hash_utils.py
import hashlib
import logging
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
def calculate_sha256(file_path: Path) -> Optional[str]:
"""Calculates the SHA-256 hash of a file."""
# Implementation as detailed in the previous plan revision...
if not isinstance(file_path, Path): return None
if not file_path.is_file(): return None
sha256_hash = hashlib.sha256()
try:
with open(file_path, "rb") as f:
for byte_block in iter(lambda: f.read(4096), b""):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()
except OSError as e:
logger.error(f"Error reading file {file_path} for SHA-256: {e}", exc_info=True)
return None
except Exception as e:
logger.error(f"Unexpected error calculating SHA-256 for {file_path}: {e}", exc_info=True)
return None
python
4.2. utils/path_utils.py (Additions/Modifications)
# (In utils/path_utils.py)
import re
import logging
from pathlib import Path
from typing import Optional, Dict
logger = logging.getLogger(__name__)
# ... (existing generate_path_from_pattern function) ...
def get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
"""Determines the next incrementing value based on existing directories."""
# Implementation as detailed in the previous plan revision...
logger.debug(f"Calculating next increment value for pattern '{output_directory_pattern}' in '{output_base_path}'")
match = re.match(r"(.*?)(\[IncrementingValue\]|(#+))(.*)", output_directory_pattern)
if not match: return "00" # Default fallback
prefix_pattern, increment_token, suffix_pattern = match.groups()
num_digits = len(increment_token) if increment_token.startswith("#") else 2
glob_increment_part = f"[{'0-9' * num_digits}]"
glob_prefix = re.sub(r'\[[^\]]+\]', '*', prefix_pattern)
glob_suffix = re.sub(r'\[[^\]]+\]', '*', suffix_pattern)
glob_pattern = f"{glob_prefix}{glob_increment_part}{glob_suffix}"
max_value = -1
try:
extract_prefix_re = re.escape(prefix_pattern)
extract_suffix_re = re.escape(suffix_pattern)
extract_regex = re.compile(rf"^{extract_prefix_re}(\d{{{num_digits}}}){extract_suffix_re}.*")
for item in output_base_path.glob(glob_pattern):
if item.is_dir():
num_match = extract_regex.match(item.name)
if num_match:
try: max_value = max(max_value, int(num_match.group(1)))
except (ValueError, IndexError): pass
except Exception as e: logger.error(f"Error searching increment values: {e}", exc_info=True)
next_value = max_value + 1
format_string = f"{{:0{num_digits}d}}"
next_value_str = format_string.format(next_value)
logger.info(f"Determined next incrementing value: {next_value_str}")
return next_value_str
python
4.3. main.py / monitor.py (Orchestration - Revised Call)
Imports: Add from utils.hash_utils import calculate_sha256, from utils.path_utils import get_next_incrementing_value.
Before ProcessingEngine.process call:
Get archive_path, output_dir.
Load config = Configuration(...).
full_sha = calculate_sha256(archive_path).
sha5_value = full_sha[:5] if full_sha else None.
next_increment_str = get_next_incrementing_value(output_dir, config.output_directory_pattern).
Modify call: engine.process(..., incrementing_value=next_increment_str, sha5_value=sha5_value).
4.4. processing_engine.py
Imports: Ensure Optional, logging, generate_path_from_pattern are imported.
process Method:
Update signature: def process(..., incrementing_value: Optional[str] = None, sha5_value: Optional[str] = None) -> ...:
Store args: self.current_incrementing_value = incrementing_value, self.current_sha5_value = sha5_value.
_save_image & _generate_metadata_file Methods:
Before calling generate_path_from_pattern, add stored values to token_data:
# Add new token data if available
if hasattr(self, 'current_incrementing_value') and self.current_incrementing_value is not None:
token_data['incrementingvalue'] = self.current_incrementing_value
if hasattr(self, 'current_sha5_value') and self.current_sha5_value is not None:
token_data['sha5'] = self.current_sha5_value
log.debug(f"Token data for path generation: {token_data}")