150 lines
7.6 KiB
Markdown
150 lines
7.6 KiB
Markdown
Implementation Plan: Path Token Data Generation
|
|
This plan outlines the steps required to implement data generation/retrieval for the [IncrementingValue], ####, and [Sha5] path tokens used in OUTPUT_DIRECTORY_PATTERN and OUTPUT_FILENAME_PATTERN.
|
|
|
|
1. Goal Recap
|
|
|
|
Enable the use of [IncrementingValue] (or ####), [Time], and [Sha5] tokens within the output path patterns used by processing_engine.py. Implement logic to generate/retrieve data for these tokens and pass it to utils.path_utils.generate_path_from_pattern. Confirm handling of [Date] and [ApplicationPath].
|
|
|
|
2. Analysis Summary & Existing Token Handling
|
|
|
|
[Date], [Time], [ApplicationPath]: Handled automatically by utils/path_utils.py. No changes needed.
|
|
[IncrementingValue] / ####: Requires data provision based on scanning existing output directories. Implementation detailed below.
|
|
[Sha5]: Requires data provision (first 5 chars of SHA-256 hash of original input file). Implementation detailed below.
|
|
Path Generation Points: _save_image() and _generate_metadata_file() in processing_engine.py.
|
|
3. Implementation Plan per Token
|
|
|
|
3.1. [IncrementingValue] / #### (Directory Scan Logic)
|
|
|
|
Scope & Behavior: Determine the next available incrementing number by scanning existing directories in the final output_base_path that match the OUTPUT_DIRECTORY_PATTERN structure. The value represents the next sequence number globally across the pattern structure.
|
|
Location: New utility function get_next_incrementing_value in utils/path_utils.py, called from orchestrating code (main.py / monitor.py).
|
|
Mechanism:
|
|
get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
|
|
Parses output_directory_pattern to find the incrementing token (#### or [IncrementingValue]) and determine padding digits.
|
|
Constructs a glob pattern based on the pattern structure (e.g., [0-9][0-9]_* for ##_*).
|
|
Uses output_base_path.glob() to find matching directories.
|
|
Extracts numerical prefixes from matching directory names using regex.
|
|
Finds the maximum existing integer value (or -1 if none).
|
|
Calculates next_value = max_value + 1.
|
|
Formats next_value as a zero-padded string based on the pattern's digits.
|
|
Returns the formatted string.
|
|
Orchestrator (main.py/monitor.py):
|
|
Load Configuration to get OUTPUT_DIRECTORY_PATTERN.
|
|
Get output_base_path.
|
|
Call next_increment_str = get_next_incrementing_value(output_base_path, config.output_directory_pattern).
|
|
Pass next_increment_str to ProcessingEngine.process as incrementing_value.
|
|
Integration (processing_engine.py):
|
|
Accept incrementing_value: Optional[str] in process signature.
|
|
Store on self.current_incrementing_value.
|
|
Add to token_data (key: 'incrementingvalue') in _save_image and _generate_metadata_file.
|
|
3.2. [Sha5]
|
|
|
|
Scope & Behavior: Calculate SHA-256 hash of the original input source file, take the first 5 characters.
|
|
Location: Orchestrating code (main.py / monitor.py) before ProcessingEngine invocation.
|
|
Mechanism: Use new utility function calculate_sha256 in utils/hash_utils.py. Call this in the orchestrator, get the first 5 chars, pass to ProcessingEngine.process.
|
|
Integration (processing_engine.py): Accept sha5_value: Optional[str] in process, store on self.current_sha5_value, add to token_data (key: 'sha5') in _save_image and _generate_metadata_file.
|
|
4. Proposed Code Changes
|
|
|
|
4.1. utils/hash_utils.py (New File)
|
|
|
|
# utils/hash_utils.py
|
|
import hashlib
|
|
import logging
|
|
from pathlib import Path
|
|
from typing import Optional
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def calculate_sha256(file_path: Path) -> Optional[str]:
|
|
"""Calculates the SHA-256 hash of a file."""
|
|
# Implementation as detailed in the previous plan revision...
|
|
if not isinstance(file_path, Path): return None
|
|
if not file_path.is_file(): return None
|
|
sha256_hash = hashlib.sha256()
|
|
try:
|
|
with open(file_path, "rb") as f:
|
|
for byte_block in iter(lambda: f.read(4096), b""):
|
|
sha256_hash.update(byte_block)
|
|
return sha256_hash.hexdigest()
|
|
except OSError as e:
|
|
logger.error(f"Error reading file {file_path} for SHA-256: {e}", exc_info=True)
|
|
return None
|
|
except Exception as e:
|
|
logger.error(f"Unexpected error calculating SHA-256 for {file_path}: {e}", exc_info=True)
|
|
return None
|
|
|
|
python
|
|
|
|
⟼
|
|
|
|
4.2. utils/path_utils.py (Additions/Modifications)
|
|
|
|
# (In utils/path_utils.py)
|
|
import re
|
|
import logging
|
|
from pathlib import Path
|
|
from typing import Optional, Dict
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# ... (existing generate_path_from_pattern function) ...
|
|
|
|
def get_next_incrementing_value(output_base_path: Path, output_directory_pattern: str) -> str:
|
|
"""Determines the next incrementing value based on existing directories."""
|
|
# Implementation as detailed in the previous plan revision...
|
|
logger.debug(f"Calculating next increment value for pattern '{output_directory_pattern}' in '{output_base_path}'")
|
|
match = re.match(r"(.*?)(\[IncrementingValue\]|(#+))(.*)", output_directory_pattern)
|
|
if not match: return "00" # Default fallback
|
|
prefix_pattern, increment_token, suffix_pattern = match.groups()
|
|
num_digits = len(increment_token) if increment_token.startswith("#") else 2
|
|
glob_increment_part = f"[{'0-9' * num_digits}]"
|
|
glob_prefix = re.sub(r'\[[^\]]+\]', '*', prefix_pattern)
|
|
glob_suffix = re.sub(r'\[[^\]]+\]', '*', suffix_pattern)
|
|
glob_pattern = f"{glob_prefix}{glob_increment_part}{glob_suffix}"
|
|
max_value = -1
|
|
try:
|
|
extract_prefix_re = re.escape(prefix_pattern)
|
|
extract_suffix_re = re.escape(suffix_pattern)
|
|
extract_regex = re.compile(rf"^{extract_prefix_re}(\d{{{num_digits}}}){extract_suffix_re}.*")
|
|
for item in output_base_path.glob(glob_pattern):
|
|
if item.is_dir():
|
|
num_match = extract_regex.match(item.name)
|
|
if num_match:
|
|
try: max_value = max(max_value, int(num_match.group(1)))
|
|
except (ValueError, IndexError): pass
|
|
except Exception as e: logger.error(f"Error searching increment values: {e}", exc_info=True)
|
|
next_value = max_value + 1
|
|
format_string = f"{{:0{num_digits}d}}"
|
|
next_value_str = format_string.format(next_value)
|
|
logger.info(f"Determined next incrementing value: {next_value_str}")
|
|
return next_value_str
|
|
|
|
python
|
|
|
|
⌄
|
|
|
|
⟼
|
|
|
|
4.3. main.py / monitor.py (Orchestration - Revised Call)
|
|
|
|
Imports: Add from utils.hash_utils import calculate_sha256, from utils.path_utils import get_next_incrementing_value.
|
|
Before ProcessingEngine.process call:
|
|
Get archive_path, output_dir.
|
|
Load config = Configuration(...).
|
|
full_sha = calculate_sha256(archive_path).
|
|
sha5_value = full_sha[:5] if full_sha else None.
|
|
next_increment_str = get_next_incrementing_value(output_dir, config.output_directory_pattern).
|
|
Modify call: engine.process(..., incrementing_value=next_increment_str, sha5_value=sha5_value).
|
|
4.4. processing_engine.py
|
|
|
|
Imports: Ensure Optional, logging, generate_path_from_pattern are imported.
|
|
process Method:
|
|
Update signature: def process(..., incrementing_value: Optional[str] = None, sha5_value: Optional[str] = None) -> ...:
|
|
Store args: self.current_incrementing_value = incrementing_value, self.current_sha5_value = sha5_value.
|
|
_save_image & _generate_metadata_file Methods:
|
|
Before calling generate_path_from_pattern, add stored values to token_data:
|
|
# Add new token data if available
|
|
if hasattr(self, 'current_incrementing_value') and self.current_incrementing_value is not None:
|
|
token_data['incrementingvalue'] = self.current_incrementing_value
|
|
if hasattr(self, 'current_sha5_value') and self.current_sha5_value is not None:
|
|
token_data['sha5'] = self.current_sha5_value
|
|
log.debug(f"Token data for path generation: {token_data}") |