Token-based output support - needs testing
This commit is contained in:
150
ProjectNotes/Architectureplan_token-data.md
Normal file
150
ProjectNotes/Architectureplan_token-data.md
Normal file
@@ -0,0 +1,150 @@
|
||||
Implementation Plan: Path Token Data Generation
|
||||
This plan outlines the steps required to implement data generation/retrieval for the [IncrementingValue], ####, and [Sha5] path tokens used in OUTPUT_DIRECTORY_PATTERN and OUTPUT_FILENAME_PATTERN.
|
||||
|
||||
1. Goal Recap
|
||||
|
||||
Enable the use of [IncrementingValue] (or ####), [Time], and [Sha5] tokens within the output path patterns used by processing_engine.py. Implement logic to generate/retrieve data for these tokens and pass it to utils.path_utils.generate_path_from_pattern. Confirm handling of [Date] and [ApplicationPath].
|
||||
|
||||
2. Analysis Summary & Existing Token Handling
|
||||
|
||||
[Date], [Time], [ApplicationPath]: Handled automatically by utils/path_utils.py. No changes needed.
|
||||
[IncrementingValue] / ####: Requires data provision based on scanning existing output directories. Implementation detailed below.
|
||||
[Sha5]: Requires data provision (first 5 chars of SHA-256 hash of original input file). Implementation detailed below.
|
||||
Path Generation Points: _save_image() and _generate_metadata_file() in processing_engine.py.
|
||||
3. Implementation Plan per Token
|
||||
|
||||
3.1. [IncrementingValue] / #### (Directory Scan Logic)
|
||||
|
||||
Scope & Behavior: Determine the next available incrementing number by scanning existing directories in the final output_base_path that match the OUTPUT_DIRECTORY_PATTERN structure. The value represents the next sequence number globally across the pattern structure.
|
||||
Location: New utility function get_next_incrementing_value in utils/path_utils.py, called from orchestrating code (main.py / monitor.py).
|
||||
Mechanism:
|
||||
get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
|
||||
Parses output_directory_pattern to find the incrementing token (#### or [IncrementingValue]) and determine padding digits.
|
||||
Constructs a glob pattern based on the pattern structure (e.g., [0-9][0-9]_* for ##_*).
|
||||
Uses output_base_path.glob() to find matching directories.
|
||||
Extracts numerical prefixes from matching directory names using regex.
|
||||
Finds the maximum existing integer value (or -1 if none).
|
||||
Calculates next_value = max_value + 1.
|
||||
Formats next_value as a zero-padded string based on the pattern's digits.
|
||||
Returns the formatted string.
|
||||
Orchestrator (main.py/monitor.py):
|
||||
Load Configuration to get OUTPUT_DIRECTORY_PATTERN.
|
||||
Get output_base_path.
|
||||
Call next_increment_str = get_next_incrementing_value(output_base_path, config.output_directory_pattern).
|
||||
Pass next_increment_str to ProcessingEngine.process as incrementing_value.
|
||||
Integration (processing_engine.py):
|
||||
Accept incrementing_value: Optional[str] in process signature.
|
||||
Store on self.current_incrementing_value.
|
||||
Add to token_data (key: 'incrementingvalue') in _save_image and _generate_metadata_file.
|
||||
3.2. [Sha5]
|
||||
|
||||
Scope & Behavior: Calculate SHA-256 hash of the original input source file, take the first 5 characters.
|
||||
Location: Orchestrating code (main.py / monitor.py) before ProcessingEngine invocation.
|
||||
Mechanism: Use new utility function calculate_sha256 in utils/hash_utils.py. Call this in the orchestrator, get the first 5 chars, pass to ProcessingEngine.process.
|
||||
Integration (processing_engine.py): Accept sha5_value: Optional[str] in process, store on self.current_sha5_value, add to token_data (key: 'sha5') in _save_image and _generate_metadata_file.
|
||||
4. Proposed Code Changes
|
||||
|
||||
4.1. utils/hash_utils.py (New File)
|
||||
|
||||
# utils/hash_utils.py
|
||||
import hashlib
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def calculate_sha256(file_path: Path) -> Optional[str]:
|
||||
"""Calculates the SHA-256 hash of a file."""
|
||||
# Implementation as detailed in the previous plan revision...
|
||||
if not isinstance(file_path, Path): return None
|
||||
if not file_path.is_file(): return None
|
||||
sha256_hash = hashlib.sha256()
|
||||
try:
|
||||
with open(file_path, "rb") as f:
|
||||
for byte_block in iter(lambda: f.read(4096), b""):
|
||||
sha256_hash.update(byte_block)
|
||||
return sha256_hash.hexdigest()
|
||||
except OSError as e:
|
||||
logger.error(f"Error reading file {file_path} for SHA-256: {e}", exc_info=True)
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error calculating SHA-256 for {file_path}: {e}", exc_info=True)
|
||||
return None
|
||||
|
||||
python
|
||||
|
||||
⟼
|
||||
|
||||
4.2. utils/path_utils.py (Additions/Modifications)
|
||||
|
||||
# (In utils/path_utils.py)
|
||||
import re
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ... (existing generate_path_from_pattern function) ...
|
||||
|
||||
def get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
|
||||
"""Determines the next incrementing value based on existing directories."""
|
||||
# Implementation as detailed in the previous plan revision...
|
||||
logger.debug(f"Calculating next increment value for pattern '{output_directory_pattern}' in '{output_base_path}'")
|
||||
match = re.match(r"(.*?)(\[IncrementingValue\]|(#+))(.*)", output_directory_pattern)
|
||||
if not match: return "00" # Default fallback
|
||||
prefix_pattern, increment_token, suffix_pattern = match.groups()
|
||||
num_digits = len(increment_token) if increment_token.startswith("#") else 2
|
||||
glob_increment_part = f"[{'0-9' * num_digits}]"
|
||||
glob_prefix = re.sub(r'\[[^\]]+\]', '*', prefix_pattern)
|
||||
glob_suffix = re.sub(r'\[[^\]]+\]', '*', suffix_pattern)
|
||||
glob_pattern = f"{glob_prefix}{glob_increment_part}{glob_suffix}"
|
||||
max_value = -1
|
||||
try:
|
||||
extract_prefix_re = re.escape(prefix_pattern)
|
||||
extract_suffix_re = re.escape(suffix_pattern)
|
||||
extract_regex = re.compile(rf"^{extract_prefix_re}(\d{{{num_digits}}}){extract_suffix_re}.*")
|
||||
for item in output_base_path.glob(glob_pattern):
|
||||
if item.is_dir():
|
||||
num_match = extract_regex.match(item.name)
|
||||
if num_match:
|
||||
try: max_value = max(max_value, int(num_match.group(1)))
|
||||
except (ValueError, IndexError): pass
|
||||
except Exception as e: logger.error(f"Error searching increment values: {e}", exc_info=True)
|
||||
next_value = max_value + 1
|
||||
format_string = f"{{:0{num_digits}d}}"
|
||||
next_value_str = format_string.format(next_value)
|
||||
logger.info(f"Determined next incrementing value: {next_value_str}")
|
||||
return next_value_str
|
||||
|
||||
python
|
||||
|
||||
⌄
|
||||
|
||||
⟼
|
||||
|
||||
4.3. main.py / monitor.py (Orchestration - Revised Call)
|
||||
|
||||
Imports: Add from utils.hash_utils import calculate_sha256, from utils.path_utils import get_next_incrementing_value.
|
||||
Before ProcessingEngine.process call:
|
||||
Get archive_path, output_dir.
|
||||
Load config = Configuration(...).
|
||||
full_sha = calculate_sha256(archive_path).
|
||||
sha5_value = full_sha[:5] if full_sha else None.
|
||||
next_increment_str = get_next_incrementing_value(output_dir, config.output_directory_pattern).
|
||||
Modify call: engine.process(..., incrementing_value=next_increment_str, sha5_value=sha5_value).
|
||||
4.4. processing_engine.py
|
||||
|
||||
Imports: Ensure Optional, logging, generate_path_from_pattern are imported.
|
||||
process Method:
|
||||
Update signature: def process(..., incrementing_value: Optional[str] = None, sha5_value: Optional[str] = None) -> ...:
|
||||
Store args: self.current_incrementing_value = incrementing_value, self.current_sha5_value = sha5_value.
|
||||
_save_image & _generate_metadata_file Methods:
|
||||
Before calling generate_path_from_pattern, add stored values to token_data:
|
||||
# Add new token data if available
|
||||
if hasattr(self, 'current_incrementing_value') and self.current_incrementing_value is not None:
|
||||
token_data['incrementingvalue'] = self.current_incrementing_value
|
||||
if hasattr(self, 'current_sha5_value') and self.current_sha5_value is not None:
|
||||
token_data['sha5'] = self.current_sha5_value
|
||||
log.debug(f"Token data for path generation: {token_data}")
|
||||
Reference in New Issue
Block a user