Implementation Plan: Path Token Data Generation This plan outlines the steps required to implement data generation/retrieval for the [IncrementingValue], ####, and [Sha5] path tokens used in OUTPUT_DIRECTORY_PATTERN and OUTPUT_FILENAME_PATTERN. 1. Goal Recap Enable the use of [IncrementingValue] (or ####), [Time], and [Sha5] tokens within the output path patterns used by processing_engine.py. Implement logic to generate/retrieve data for these tokens and pass it to utils.path_utils.generate_path_from_pattern. Confirm handling of [Date] and [ApplicationPath]. 2. Analysis Summary & Existing Token Handling [Date], [Time], [ApplicationPath]: Handled automatically by utils/path_utils.py. No changes needed. [IncrementingValue] / ####: Requires data provision based on scanning existing output directories. Implementation detailed below. [Sha5]: Requires data provision (first 5 chars of SHA-256 hash of original input file). Implementation detailed below. Path Generation Points: _save_image() and _generate_metadata_file() in processing_engine.py. 3. Implementation Plan per Token 3.1. [IncrementingValue] / #### (Directory Scan Logic) Scope & Behavior: Determine the next available incrementing number by scanning existing directories in the final output_base_path that match the OUTPUT_DIRECTORY_PATTERN structure. The value represents the next sequence number globally across the pattern structure. Location: New utility function get_next_incrementing_value in utils/path_utils.py, called from orchestrating code (main.py / monitor.py). Mechanism: get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str: Parses output_directory_pattern to find the incrementing token (#### or [IncrementingValue]) and determine padding digits. Constructs a glob pattern based on the pattern structure (e.g., [0-9][0-9]_* for ##_*). Uses output_base_path.glob() to find matching directories. Extracts numerical prefixes from matching directory names using regex. Finds the maximum existing integer value (or -1 if none). Calculates next_value = max_value + 1. Formats next_value as a zero-padded string based on the pattern's digits. Returns the formatted string. Orchestrator (main.py/monitor.py): Load Configuration to get OUTPUT_DIRECTORY_PATTERN. Get output_base_path. Call next_increment_str = get_next_incrementing_value(output_base_path, config.output_directory_pattern). Pass next_increment_str to ProcessingEngine.process as incrementing_value. Integration (processing_engine.py): Accept incrementing_value: Optional[str] in process signature. Store on self.current_incrementing_value. Add to token_data (key: 'incrementingvalue') in _save_image and _generate_metadata_file. 3.2. [Sha5] Scope & Behavior: Calculate SHA-256 hash of the original input source file, take the first 5 characters. Location: Orchestrating code (main.py / monitor.py) before ProcessingEngine invocation. Mechanism: Use new utility function calculate_sha256 in utils/hash_utils.py. Call this in the orchestrator, get the first 5 chars, pass to ProcessingEngine.process. Integration (processing_engine.py): Accept sha5_value: Optional[str] in process, store on self.current_sha5_value, add to token_data (key: 'sha5') in _save_image and _generate_metadata_file. 4. Proposed Code Changes 4.1. utils/hash_utils.py (New File) # utils/hash_utils.py import hashlib import logging from pathlib import Path from typing import Optional logger = logging.getLogger(__name__) def calculate_sha256(file_path: Path) -> Optional[str]: """Calculates the SHA-256 hash of a file.""" # Implementation as detailed in the previous plan revision... if not isinstance(file_path, Path): return None if not file_path.is_file(): return None sha256_hash = hashlib.sha256() try: with open(file_path, "rb") as f: for byte_block in iter(lambda: f.read(4096), b""): sha256_hash.update(byte_block) return sha256_hash.hexdigest() except OSError as e: logger.error(f"Error reading file {file_path} for SHA-256: {e}", exc_info=True) return None except Exception as e: logger.error(f"Unexpected error calculating SHA-256 for {file_path}: {e}", exc_info=True) return None python ⟼ 4.2. utils/path_utils.py (Additions/Modifications) # (In utils/path_utils.py) import re import logging from pathlib import Path from typing import Optional, Dict logger = logging.getLogger(__name__) # ... (existing generate_path_from_pattern function) ... def get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str: """Determines the next incrementing value based on existing directories.""" # Implementation as detailed in the previous plan revision... logger.debug(f"Calculating next increment value for pattern '{output_directory_pattern}' in '{output_base_path}'") match = re.match(r"(.*?)(\[IncrementingValue\]|(#+))(.*)", output_directory_pattern) if not match: return "00" # Default fallback prefix_pattern, increment_token, suffix_pattern = match.groups() num_digits = len(increment_token) if increment_token.startswith("#") else 2 glob_increment_part = f"[{'0-9' * num_digits}]" glob_prefix = re.sub(r'\[[^\]]+\]', '*', prefix_pattern) glob_suffix = re.sub(r'\[[^\]]+\]', '*', suffix_pattern) glob_pattern = f"{glob_prefix}{glob_increment_part}{glob_suffix}" max_value = -1 try: extract_prefix_re = re.escape(prefix_pattern) extract_suffix_re = re.escape(suffix_pattern) extract_regex = re.compile(rf"^{extract_prefix_re}(\d{{{num_digits}}}){extract_suffix_re}.*") for item in output_base_path.glob(glob_pattern): if item.is_dir(): num_match = extract_regex.match(item.name) if num_match: try: max_value = max(max_value, int(num_match.group(1))) except (ValueError, IndexError): pass except Exception as e: logger.error(f"Error searching increment values: {e}", exc_info=True) next_value = max_value + 1 format_string = f"{{:0{num_digits}d}}" next_value_str = format_string.format(next_value) logger.info(f"Determined next incrementing value: {next_value_str}") return next_value_str python ⌄ ⟼ 4.3. main.py / monitor.py (Orchestration - Revised Call) Imports: Add from utils.hash_utils import calculate_sha256, from utils.path_utils import get_next_incrementing_value. Before ProcessingEngine.process call: Get archive_path, output_dir. Load config = Configuration(...). full_sha = calculate_sha256(archive_path). sha5_value = full_sha[:5] if full_sha else None. next_increment_str = get_next_incrementing_value(output_dir, config.output_directory_pattern). Modify call: engine.process(..., incrementing_value=next_increment_str, sha5_value=sha5_value). 4.4. processing_engine.py Imports: Ensure Optional, logging, generate_path_from_pattern are imported. process Method: Update signature: def process(..., incrementing_value: Optional[str] = None, sha5_value: Optional[str] = None) -> ...: Store args: self.current_incrementing_value = incrementing_value, self.current_sha5_value = sha5_value. _save_image & _generate_metadata_file Methods: Before calling generate_path_from_pattern, add stored values to token_data: # Add new token data if available if hasattr(self, 'current_incrementing_value') and self.current_incrementing_value is not None: token_data['incrementingvalue'] = self.current_incrementing_value if hasattr(self, 'current_sha5_value') and self.current_sha5_value is not None: token_data['sha5'] = self.current_sha5_value log.debug(f"Token data for path generation: {token_data}")