From: Thomas Walker Lynch Date: Tue, 18 Nov 2025 08:27:49 +0000 (+0000) Subject: check tool development X-Git-Url: https://git.reasoningtechnology.com/style/static/gitweb.js?a=commitdiff_plain;h=461d3dcd8c5a007939dd84a1034c24d334c297c6;p=Harmony.git check tool development --- diff --git a/document/check_algorithm.org b/document/check_algorithm.org new file mode 100644 index 0000000..d9d4f4e --- /dev/null +++ b/document/check_algorithm.org @@ -0,0 +1,54 @@ +Ah oui. Donc .. + +This is what I would like you to do. Start over on the core code. + +1. there is git ignore class. + +A 'discern' function is any function that is given a project path and a relative path into the project, and then returns either 'Accept' or 'Ignore'. + +The git ignore class holds a discern function stack. Said stack is initialized with a triple. the first of the pair is a discernment function that always returns 'Accept'. The second in the triple is a project path, and the third a relative path to a directory where said .gitignore was found. + +Initially the stack is empty. + +Said class has a method called `check` that returns either 'Accept' or 'Ignore'. + +When traversing a project tree, typically Harmony, or , the traversing function will send each (, , ) triple to the `check` method of the git ignore class instance. For our program we have one such instance. + +The `check` method: 1) if the discern stack i snot empty, `check` calls the discern functions on the ignore function stack, if any of them returns 'Ignore' the `check` method immediately returns 'Ignore'. 2) if all `discern` function returns `Accept`, and the node name is '.gitignore', then: +2.1 the top of the discern stack path is checked and compared to the path to the node given to `check`. If the path is the same, then we have a strange error, we have seen two files called .gitignore in the same directory. I guess we go buy a lottery ticket. +2.1 the path to the node is sent to then sent to `parse_gitignore` function which returns a discern function. That functino as part of the triple, is pushed onto the git ignore instance's discern function stack. + +When ever the traversing function pops back up, it calls the classe's 'pop' function which pops the top off of the git ignore instance's discern function stack. + +A note about the gitignore file parser: currently (see to-do list) it returns one of two dicern functions. If the relative path is empty, i.e. we are at the top of the project, then it returns a discern function that returns Accept. If the relative path is not empty, it returns a discern function that always returns 'Ignore'. + +2. The subset Harmony checker: + +2.1 First traverse the entire Harmony tree. That is the tree that has the tool directory the checker is running from. + +For each node in the Harmony tree: +2.1.1 check if it is to be ignored, if so skip it +2.1.2 take said node's relative path and use it as a key for a new entry in the 'skeleton dictionary'. Make the entry value 'information' about its node. + +A node's 'information' includes: modification date, and its type being file or directory or something else, and if it is a leaf node or not in the tree. (surely 'make_information' is a separate function). If the 'checksum' command is given, then the node's checksum becomes part of information. (new command) + + +2.2 traverse the tree and make two sets: +2.2.1 an 'ignored' set, of the relative paths to ignored files. This will require making a git ignore class instance and calling the proper methods while doing a traversal. +2.2.2 look up the relative path to the node in the skeleton dictionary +2.2.2.1 if lookup information says the node corresponds to a leaf node in the skeleton, do not descend into this node. +2.2.2.2 if the lookup into the skeleton dictionary fails, add this node to the 'addendum' list. + + +2.2 traverse the skeleton dictionary entries + +for each entry: +2.2.1 Check if the relative path is in the gitignored set. +2.2.1.1 If so add it to the 'present_but_ignored' list. Continue to the next entry. Otherwise: +2.2.1.2 Use the relative path (the dictionary key) to extend the project path to get an node path. +2.2.1.2.1 check if such a node exists (likely either a file or a directory it is exists) +2.2.2.2.1.1 If there is not an node found: add said relative path to the 'missing' list, and continue to the next entry. Otherwise: +2.2.2.2.1.2 If the skeleton dictionary information says it is a leaf node in the skeleton, but it is not a leaf node in then send this directory node to the addendum descender function. (The addendum descender function then descends from said node, while appending all node relative paths discovered into the addendum list.) +2.2.2.2.1.3 If the node has a more recent modification time add its relative path to the 'newer' list. If it is older add it to the 'older' list. If it is the same age, and the 'checksum' command has been given, compare the checksums. If they differ, then add the relative path to the 'different' list. + +3. Make reports. Each report we are to generate corresponds to a command name. We will need additional commands to cover the additional lists I described in this spec. The 'all' command will generate all the reports, of course. diff --git a/tool/skeleton_check b/tool/skeleton_check new file mode 100755 index 0000000..cc781fe --- /dev/null +++ b/tool/skeleton_check @@ -0,0 +1,216 @@ +#!/usr/bin/env python3 +# -*- mode: python; coding: utf-8; python-indent-offset: 2; indent-tabs-mode: nil -*- +""" +skeleton_check — CLI entry point for the Harmony Skeleton Auditor + +This script wires CLI argument parsing to: + + - skeleton_diff_core (core logic) + - skeleton_diff_docs (usage / help) +""" + +from __future__ import annotations + +import sys +from pathlib import Path +from typing import List, Optional + +from skeleton_diff_core import ( + work_environment, + work_structure, + work_age, + work_import, + work_export, + work_addendum, + work_suspicious, + work_version, +) +from skeleton_diff_docs import ( + work_help, + work_usage, +) + + +def CLI() -> int: + args_list = sys.argv[1:] + + if not args_list: + work_usage() + return 1 + + # 1. Global dominating commands: usage, help, version + global_dom_commands_set = { + "usage", + "help", + "version", + } + + for token in args_list: + if token in global_dom_commands_set: + if token == "usage": + work_usage() + elif token == "help": + work_help() + elif token == "version": + work_version() + return 0 + + # 2. Commands that never require a project, and those that do + commands_no_other_set = { + "environment", + } + + commands_require_other_set = { + "structure", + "age", + "import", + "export", + "suspicious", + "addendum", + "all", + } + + all_commands_set = commands_no_other_set | commands_require_other_set + + commands_list: List[str] = [] + other_root_path: Optional[Path] = None + project_needed_flag = False + earliest_requires_index: Optional[int] = None + + n_args = len(args_list) + last_index = n_args - 1 + + for index, token in enumerate(args_list): + if token in all_commands_set: + # If we already saw a project-requiring command earlier, and this is + # the last token, interpret it as the project path instead of a command. + if ( + project_needed_flag + and index == last_index + and earliest_requires_index is not None + ): + other_root_path = Path(token) + break + + # Normal command + commands_list.append(token) + if token in commands_require_other_set and earliest_requires_index is None: + earliest_requires_index = index + project_needed_flag = True + + else: + # Not a known command: may be the project path, but only if a command + # that requires a project has already been seen and this is the last arg. + if ( + project_needed_flag + and index == last_index + and earliest_requires_index is not None + ): + other_root_path = Path(token) + break + + print(f"ERROR: unknown command '{token}'.", file=sys.stderr) + work_usage() + return 1 + + # 3. Post-parse checks + if project_needed_flag: + if other_root_path is None: + last_command = commands_list[-1] if commands_list else "" + print( + f"ERROR: missing after command '{last_command}'.", + file=sys.stderr, + ) + work_usage() + return 1 + + if not other_root_path.is_dir(): + print(f"ERROR: {other_root_path} is not a directory.", file=sys.stderr) + work_usage() + return 1 + + # 4. Expand 'all' + expanded_commands_list: List[str] = [] + if "all" in commands_list and len(commands_list) > 1: + print("ERROR: 'all' cannot be combined with other commands.", file=sys.stderr) + work_usage() + return 1 + + for command in commands_list: + if command == "all": + expanded_commands_list.extend([ + "environment", + "structure", + "age", + "import", + "export", + "suspicious", + "addendum", + ]) + else: + expanded_commands_list.append(command) + + commands_list = expanded_commands_list + + # 5. Execute commands + other_root: Optional[Path] = other_root_path + + for command in commands_list: + print(f"\n--- Running: {command} ---") + + if command == "environment": + work_environment() + + elif command == "structure": + if other_root is None: + print("ERROR: 'structure' requires .", file=sys.stderr) + work_usage() + return 1 + work_structure(other_root) + + elif command == "age": + if other_root is None: + print("ERROR: 'age' requires .", file=sys.stderr) + work_usage() + return 1 + work_age(other_root, checksum_flag=False) + + elif command == "import": + if other_root is None: + print("ERROR: 'import' requires .", file=sys.stderr) + work_usage() + return 1 + work_import(other_root) + + elif command == "export": + if other_root is None: + print("ERROR: 'export' requires .", file=sys.stderr) + work_usage() + return 1 + work_export(other_root) + + elif command == "suspicious": + if other_root is None: + print("ERROR: 'suspicious' requires .", file=sys.stderr) + work_usage() + return 1 + work_suspicious(other_root, checksum_flag=False) + + elif command == "addendum": + if other_root is None: + print("ERROR: 'addendum' requires .", file=sys.stderr) + work_usage() + return 1 + work_addendum(other_root) + + else: + # Should be unreachable, because we validated commands_list. + print(f"Unknown command: {command}") + work_usage() + return 1 + + return 0 + + +if __name__ == "__main__": + sys.exit(CLI()) diff --git a/tool/skeleton_diff_core.py b/tool/skeleton_diff_core.py new file mode 100644 index 0000000..3bd6401 --- /dev/null +++ b/tool/skeleton_diff_core.py @@ -0,0 +1,696 @@ +#!/usr/bin/env python3 +# -*- mode: python; coding: utf-8; python-indent-offset: 2; indent-tabs-mode: nil -*- +# TODO: +# - Properly parse and apply .gitignore patterns instead of the current +# heuristic: +# * At project root: .gitignore is treated as a no-op (discern always +# returns "Accept"). +# * Below root: .gitignore causes the entire directory subtree to be +# ignored (except for the .gitignore file itself). +# - Integrate real .gitignore parsing into the GitIgnore discern functions +# and remove the "ignore whole subtree" simplification. +""" +skeleton_diff_core — Harmony Skeleton Auditor, core logic + +Version: Major.Minor = 0.5 +Author: Thomas Walker Lynch, with Grok and Vaelorin +Date: 2025-11-18 + +This module holds the core data structures and algorithms for comparing +a Harmony project (the skeleton) against another project () that was +cloned or derived from it. + +CLI and documentation live in separate modules: + - skeleton_diff_docs.py (usage/help text) + - skeleton_check (CLI front end) +""" + +from __future__ import annotations + +import hashlib +import os +import sys +from dataclasses import dataclass +from pathlib import Path +from typing import Callable, Dict, List, Optional, Set, Tuple + +# ---------------------------------------------------------------------- +# Version +# ---------------------------------------------------------------------- +MAJOR = 0 +MINOR = 5 +VERSION = f"{MAJOR}.{MINOR}" + +# ---------------------------------------------------------------------- +# Harmony root +# ---------------------------------------------------------------------- +HARMONY_ROOT = Path(os.getenv("REPO_HOME", str(Path.cwd()))).resolve() +if not HARMONY_ROOT.exists(): + print("ERROR: $REPO_HOME not set or invalid. Source env_toolsmith.", file=sys.stderr) + sys.exit(1) + +# ---------------------------------------------------------------------- +# Types +# ---------------------------------------------------------------------- +DiscernResult = str # "Accept" or "Ignore" +DiscernFn = Callable[[Path, "NodeInfo"], DiscernResult] + + +@dataclass +class NodeInfo: + """Filesystem node information for comparison.""" + + is_file: bool + is_dir: bool + is_other: bool + is_leaf: bool + mtime: Optional[float] + checksum: Optional[str] + + +@dataclass +class ComparisonResults: + """Results of comparing Harmony skeleton to .""" + + missing_list: List[Path] + present_but_ignored_list: List[Path] + newer_list: List[Path] + older_list: List[Path] + different_list: List[Path] + addendum_list: List[Path] + + +# ---------------------------------------------------------------------- +# GitIgnore support +# ---------------------------------------------------------------------- +class GitIgnore: + """ + Simplified .gitignore handler based on a stack of discern functions. + + Each entry in the stack is: + (discern_fn, scope_dir_rel) + + Where: + - discern_fn(rel_path, node_info) -> "Accept" | "Ignore" + - scope_dir_rel is the directory *containing* the .gitignore file + that produced this discern_fn. + + The current implementation does not parse .gitignore patterns. Instead, + parse_gitignore() returns one of two heuristics (see TODO at top of file). + """ + + def __init__(self, project_root: Path) -> None: + self.project_root = project_root + self._stack: List[Tuple[DiscernFn, Path]] = [] + + def push(self, scope_dir_rel: Path, discern_fn: DiscernFn) -> None: + self._stack.append((discern_fn, scope_dir_rel)) + + def pop(self) -> None: + if self._stack: + self._stack.pop() + + def check(self, rel_path: Path, node_info: NodeInfo) -> DiscernResult: + """ + Apply discern functions from top of stack down. If any returns "Ignore", + we return "Ignore". If none do, we return "Accept". + """ + # Most specific rules are near the top. + for discern_fn, _scope_dir_rel in reversed(self._stack): + decision = discern_fn(rel_path, node_info) + if decision == "Ignore": + return "Ignore" + return "Accept" + + +def parse_gitignore( + project_root: Path, + gitignore_rel_path: Path, + node_info: NodeInfo, +) -> DiscernFn: + """ + Stub .gitignore parser. + + For now: + - If the .gitignore is at the project root (scope directory == "."), + return a discern function that always returns "Accept". + - Otherwise, return a discern function that ignores the entire subtree + under the directory that contains the .gitignore file, except for + the .gitignore file itself. + + This is intentionally simple and marked as a TODO for future improvement. + """ + scope_dir_rel = gitignore_rel_path.parent + + if scope_dir_rel == Path("."): + def discern_root(rel_path: Path, node_info_: NodeInfo) -> DiscernResult: + # Heuristic: root-level .gitignore does nothing until we implement + # real parsing. + return "Accept" + return discern_root + + def discern_subtree(rel_path: Path, node_info_: NodeInfo) -> DiscernResult: + # Always accept the .gitignore file itself. + if rel_path == gitignore_rel_path: + return "Accept" + # Ignore everything under the scope directory. + if len(scope_dir_rel.parts) <= len(rel_path.parts): + if rel_path.parts[: len(scope_dir_rel.parts)] == scope_dir_rel.parts: + return "Ignore" + return "Accept" + + return discern_subtree + + +# ---------------------------------------------------------------------- +# Built-in ignore patterns (independent of .gitignore) +# ---------------------------------------------------------------------- +def is_builtin_ignored(rel_path: Path) -> bool: + """ + Quick filter for paths we always ignore, regardless of .gitignore. + + Patterns: + - Any path under a ".git" directory + - __pycache__ directories + - .ipynb_checkpoints + - .pytest_cache + - Python bytecode files: *.pyc, *.pyo, *.pyd, *.py[cod] + - Editor backups: *~, *.bak + """ + parts = rel_path.parts + if not parts: + return False + + if ".git" in parts: + return True + + basename = parts[-1] + + if basename in { + "__pycache__", + ".ipynb_checkpoints", + ".pytest_cache", + }: + return True + + if ( + basename.endswith(".pyc") + or basename.endswith(".pyo") + or basename.endswith(".pyd") + ): + return True + + if basename.endswith("~") or basename.endswith(".bak"): + return True + + return False + + +# ---------------------------------------------------------------------- +# NodeInfo helpers +# ---------------------------------------------------------------------- +def make_node_info( + abs_path: Path, + compute_checksum_flag: bool, +) -> NodeInfo: + is_dir_flag = abs_path.is_dir() + is_file_flag = abs_path.is_file() + is_other_flag = not (is_dir_flag or is_file_flag) + + try: + stat_obj = abs_path.stat() + mtime_value = stat_obj.st_mtime + except OSError: + mtime_value = None + + checksum_value: Optional[str] = None + if compute_checksum_flag and is_file_flag: + checksum_value = compute_checksum(abs_path) + + # Leaf determination is done in a second pass after indexing. + return NodeInfo( + is_file=is_file_flag, + is_dir=is_dir_flag, + is_other=is_other_flag, + is_leaf=False, + mtime=mtime_value, + checksum=checksum_value, + ) + + +def compute_checksum(abs_path: Path) -> str: + """Compute a SHA256 checksum for a file.""" + sha = hashlib.sha256() + try: + with abs_path.open("rb") as f_obj: + while True: + block = f_obj.read(65536) + if not block: + break + sha.update(block) + except OSError: + # On error, return a sentinel so we can still compare deterministically. + return "ERROR" + return sha.hexdigest() + + +# ---------------------------------------------------------------------- +# Project indexing +# ---------------------------------------------------------------------- +def index_project( + project_root: Path, + compute_checksum_flag: bool, +) -> Tuple[Dict[Path, NodeInfo], Set[Path]]: + """ + Build an index for a project tree. + + Returns: + (info_dict, ignored_set) + + Where: + - info_dict maps relative paths -> NodeInfo for all *accepted* nodes. + - ignored_set is the set of relative paths that were skipped due to + built-in ignore patterns or GitIgnore rules. + """ + info_dict: Dict[Path, NodeInfo] = {} + ignored_set: Set[Path] = set() + gitignore_obj = GitIgnore(project_root) + + def recurse(dir_rel_path: Path) -> None: + abs_dir_path = project_root / dir_rel_path + + # Handle .gitignore in this directory (if any) + gitignore_pushed_flag = False + gitignore_abs_path = abs_dir_path / ".gitignore" + if gitignore_abs_path.exists() and gitignore_abs_path.is_file(): + gitignore_rel_path = ( + dir_rel_path / ".gitignore" + if dir_rel_path != Path(".") + else Path(".gitignore") + ) + + node_info = make_node_info( + gitignore_abs_path, + compute_checksum_flag=False, + ) + + # Existing rules decide whether .gitignore itself is ignored. + decision = gitignore_obj.check(gitignore_rel_path, node_info) + if decision == "Ignore": + ignored_set.add(gitignore_rel_path) + else: + # Accept the .gitignore file and push a new discern function. + discern_fn = parse_gitignore( + project_root, + gitignore_rel_path, + node_info, + ) + gitignore_obj.push(dir_rel_path, discern_fn) + gitignore_pushed_flag = True + + # Walk directory contents + try: + entry_iter = sorted(abs_dir_path.iterdir(), key=lambda p: p.name) + except OSError: + # If we cannot list this directory, treat it as unreadable. + if gitignore_pushed_flag: + gitignore_obj.pop() + return + + for abs_entry_path in entry_iter: + entry_name = abs_entry_path.name + if entry_name == ".gitignore": + # Already handled above. + continue + + if dir_rel_path == Path("."): + rel_path = Path(entry_name) + else: + rel_path = dir_rel_path / entry_name + + # Built-in ignore filter first. + if is_builtin_ignored(rel_path): + ignored_set.add(rel_path) + if abs_entry_path.is_dir(): + # Do not recurse into ignored directories. + continue + continue + + node_info = make_node_info( + abs_entry_path, + compute_checksum_flag, + ) + + decision = gitignore_obj.check(rel_path, node_info) + if decision == "Ignore": + ignored_set.add(rel_path) + if abs_entry_path.is_dir(): + # Do not recurse into ignored directories. + continue + continue + + # Accepted node: record its info. + info_dict[rel_path] = node_info + + if abs_entry_path.is_dir(): + recurse(rel_path) + + # Pop the .gitignore rule for this directory scope, if any. + if gitignore_pushed_flag: + gitignore_obj.pop() + + # Start at project root (".") + recurse(Path(".")) + + # Second pass: determine leaf nodes. + # Initialize all as leaf, then mark parents as non-leaf. + for node_info in info_dict.values(): + node_info.is_leaf = True + + for rel_path in info_dict.keys(): + parent_rel_path = rel_path.parent + if parent_rel_path in info_dict: + info_dict[parent_rel_path].is_leaf = False + + return info_dict, ignored_set + + +# ---------------------------------------------------------------------- +# Comparison +# ---------------------------------------------------------------------- +def has_children( + info_dict: Dict[Path, NodeInfo], + parent_rel_path: Path, +) -> bool: + """Return True if any node in info_dict is a strict descendant of parent.""" + parent_parts = parent_rel_path.parts + parent_len = len(parent_parts) + if parent_len == 0: + # Parent is root; any non-root path counts as a child. + for rel_path in info_dict.keys(): + if rel_path != Path("."): + return True + return False + + for rel_path in info_dict.keys(): + if rel_path == parent_rel_path: + continue + if len(rel_path.parts) <= parent_len: + continue + if rel_path.parts[:parent_len] == parent_parts: + return True + return False + + +def compare_harmony_to_other( + harmony_root: Path, + other_root: Path, + compute_checksum_flag: bool, +) -> ComparisonResults: + """ + Compare Harmony (skeleton) to and produce the main lists: + + - missing_list + - present_but_ignored_list + - newer_list + - older_list + - different_list + - addendum_list + """ + harmony_info_dict, _harmony_ignored_set = index_project( + harmony_root, + compute_checksum_flag, + ) + + other_info_dict, other_ignored_set = index_project( + other_root, + compute_checksum_flag, + ) + + missing_list: List[Path] = [] + present_but_ignored_list: List[Path] = [] + newer_list: List[Path] = [] + older_list: List[Path] = [] + different_list: List[Path] = [] + addendum_set: Set[Path] = set() + + other_keys_set = set(other_info_dict.keys()) + + # First pass: walk Harmony skeleton dictionary. + for rel_path, harmony_info in harmony_info_dict.items(): + # 2.2.1: if the relative path is in the ignored set. + if rel_path in other_ignored_set: + present_but_ignored_list.append(rel_path) + continue + + other_info = other_info_dict.get(rel_path) + if other_info is None: + # 2.2.2.2.1.1: missing in . + missing_list.append(rel_path) + continue + + # 2.2.2.2.1.2: skeleton leaf vs non-leaf in . + if ( + harmony_info.is_dir + and harmony_info.is_leaf + and other_info.is_dir + and has_children(other_info_dict, rel_path) + ): + # Add all descendants of this directory in to addendum. + parent_parts = rel_path.parts + parent_len = len(parent_parts) + for candidate_rel in other_keys_set: + if candidate_rel == rel_path: + continue + if len(candidate_rel.parts) <= parent_len: + continue + if candidate_rel.parts[:parent_len] == parent_parts: + addendum_set.add(candidate_rel) + + # 2.2.2.2.1.3: modification time comparison (and optional checksum). + if harmony_info.mtime is not None and other_info.mtime is not None: + if other_info.mtime > harmony_info.mtime: + newer_list.append(rel_path) + elif other_info.mtime < harmony_info.mtime: + older_list.append(rel_path) + else: + if ( + compute_checksum_flag + and harmony_info.checksum is not None + and other_info.checksum is not None + and harmony_info.checksum != other_info.checksum + ): + different_list.append(rel_path) + + # Second pass: addendum nodes that do not correspond to any skeleton entry. + for other_rel_path in other_keys_set: + if other_rel_path not in harmony_info_dict: + addendum_set.add(other_rel_path) + + addendum_list = sorted(addendum_set) + + missing_list.sort() + present_but_ignored_list.sort() + newer_list.sort() + older_list.sort() + different_list.sort() + + return ComparisonResults( + missing_list=missing_list, + present_but_ignored_list=present_but_ignored_list, + newer_list=newer_list, + older_list=older_list, + different_list=different_list, + addendum_list=addendum_list, + ) + + +# ---------------------------------------------------------------------- +# Cached comparison for command handlers +# ---------------------------------------------------------------------- +_cached_other_root: Optional[Path] = None +_cached_checksum_flag: bool = False +_cached_results: Optional[ComparisonResults] = None + + +def ensure_comparison( + other_root: Path, + compute_checksum_flag: bool = False, +) -> ComparisonResults: + global _cached_other_root + global _cached_checksum_flag + global _cached_results + + other_root_resolved = other_root.resolve() + + if ( + _cached_results is None + or _cached_other_root != other_root_resolved + or _cached_checksum_flag != compute_checksum_flag + ): + _cached_results = compare_harmony_to_other( + HARMONY_ROOT, + other_root_resolved, + compute_checksum_flag, + ) + _cached_other_root = other_root_resolved + _cached_checksum_flag = compute_checksum_flag + + return _cached_results + + +# ---------------------------------------------------------------------- +# Work functions (called by CLI) +# ---------------------------------------------------------------------- +def work_environment() -> int: + print("=== Environment ===") + print(f"REPO_HOME = {HARMONY_ROOT}") + for key, value in sorted(os.environ.items()): + if key.startswith(("HARMONY_", "REPO_", "PATH")) or "tool" in key.lower(): + print(f"{key} = {value}") + return 0 + + +def work_structure(other_root: Path) -> int: + results = ensure_comparison(other_root, compute_checksum_flag=False) + + print("=== Structure / Presence ===") + + if results.missing_list: + print("Missing Harmony paths in :") + for rel_path in results.missing_list: + print(f" [MISSING] {rel_path}") + print() + else: + print("No missing skeleton paths found in .") + print() + + if results.present_but_ignored_list: + print("Paths present in but ignored by its .gitignore / filters:") + for rel_path in results.present_but_ignored_list: + print(f" [IGNORED] {rel_path}") + print() + else: + print("No skeleton paths are masked by 's ignore rules.") + + return 0 + + +def work_age(other_root: Path, checksum_flag: bool = False) -> int: + results = ensure_comparison(other_root, compute_checksum_flag=checksum_flag) + + print("=== File Age Comparison ===") + + if results.newer_list: + print("Paths newer in (import candidates):") + for rel_path in results.newer_list: + print(f" [NEWER] {rel_path}") + print() + else: + print("No paths are newer in than in Harmony.") + print() + + if results.older_list: + print("Paths older in (export candidates):") + for rel_path in results.older_list: + print(f" [OLDER] {rel_path}") + print() + else: + print("No paths are older in than in Harmony.") + print() + + if checksum_flag and results.different_list: + print("Paths with equal mtime but different checksum (suspicious):") + for rel_path in results.different_list: + print(f" [DIFFERENT] {rel_path}") + print() + elif checksum_flag: + print("No checksum-only differences detected.") + print() + + return 0 + + +def work_import(other_root: Path) -> int: + results = ensure_comparison(other_root, compute_checksum_flag=False) + + print("=== Import Commands (newer → Harmony) ===") + + if not results.newer_list: + print(" No newer files in to import.") + return 0 + + for rel_path in results.newer_list: + src = other_root / rel_path + dst = HARMONY_ROOT / rel_path + print(f"cp {src} {dst} # clobbers older Harmony file") + + return 0 + + +def work_export(other_root: Path) -> int: + results = ensure_comparison(other_root, compute_checksum_flag=False) + + print("=== Export Commands (Harmony → ) ===") + + if not results.older_list: + print(" No stale files in to export.") + return 0 + + for rel_path in results.older_list: + src = HARMONY_ROOT / rel_path + dst = other_root / rel_path + print(f"cp {src} {dst} # clobbers stale file in ") + + return 0 + + +def work_addendum(other_root: Path) -> int: + results = ensure_comparison(other_root, compute_checksum_flag=False) + + print("=== Addendum: project-local paths in ===") + + if not results.addendum_list: + print(" None found.") + return 0 + + for rel_path in results.addendum_list: + print(f" [ADDENDUM] {rel_path}") + + return 0 + + +def work_suspicious(other_root: Path, checksum_flag: bool = False) -> int: + """ + Suspicious = checksum-only differences (when enabled) plus + present_but_ignored, grouped as "things that deserve a human look". + """ + results = ensure_comparison(other_root, compute_checksum_flag=checksum_flag) + + print("=== Suspicious Paths ===") + + any_flag = False + + if results.present_but_ignored_list: + any_flag = True + print("Skeleton paths masked by 's ignore rules:") + for rel_path in results.present_but_ignored_list: + print(f" [IGNORED] {rel_path}") + print() + + if checksum_flag and results.different_list: + any_flag = True + print("Paths with equal mtime but different checksum:") + for rel_path in results.different_list: + print(f" [DIFFERENT] {rel_path}") + print() + + if not any_flag: + print(" None found.") + + return 0 + + +def work_version() -> int: + print(f"skeleton_diff version {VERSION}") + return 0 diff --git a/tool/skeleton_diff_docs.py b/tool/skeleton_diff_docs.py new file mode 100644 index 0000000..f79241b --- /dev/null +++ b/tool/skeleton_diff_docs.py @@ -0,0 +1,232 @@ +#!/usr/bin/env python3 +# -*- mode: python; coding: utf-8; python-indent-offset: 2; indent-tabs-mode: nil -*- +""" +skeleton_diff_docs — usage and help text for skeleton_diff / skeleton_check +""" + +from __future__ import annotations + +from pathlib import Path +import sys + +from skeleton_diff_core import VERSION + + +def work_usage() -> int: + program_name = Path(sys.argv[0]).name or "skeleton_check" + + print(f"Usage: {program_name} []... []") + print() + print(" is required if any of the specified commands") + print("require a project to analyze.") + print() + print("Commands:") + print(" version Show program version (Major.Minor)") + print(" help Long-form documentation") + print(" usage This short summary") + print(" environment Show key environment variables (including $REPO_HOME)") + print(" structure Compare skeleton presence vs (missing / ignored)") + print(" age Compare file ages (newer / older)") + print(" import Print shell commands for pulling newer skeleton") + print(" paths from into Harmony") + print(" export Print shell commands for pushing current skeleton") + print(" paths from Harmony into ") + print(' suspicious Show paths masked by ignore rules and checksum-only') + print(" differences (when checksum mode is enabled)") + print(" addendum List project-local paths in that do not exist") + print(" in the Harmony skeleton or that live under skeleton") + print(" leaf directories") + print(" all Run the full set of analyses for a project") + print() + print("Examples:") + print(f" {program_name} usage") + print(f" {program_name} structure ../subu") + print(f" {program_name} all ../subu") + print() + print(f"Run '{program_name} help' for detailed explanations.") + return 0 + + +def work_help() -> int: + help_text = f""" +skeleton_diff — Harmony Skeleton Auditor +======================================== + +Version: {VERSION} + +1. Purpose +1.1 The skeleton_diff tool compares a Harmony project (the skeleton) with + another project () that was originally cloned from Harmony. +1.2 Over time, individual projects tend to evolve: + - Some improvements are made in projects but never pulled back to the + Harmony skeleton. + - Some improvements make it back into Harmony, leaving older projects + with stale copies of skeleton files. + - Extra directories and files appear in projects, some intentional and + some accidental. +1.3 skeleton_diff helps you see that drift clearly so that you can: + - Pull newer tooling back into the skeleton. + - Push newer skeleton files out into projects. + - Spot suspicious clutter, ignored paths, and structural misuse. + +2. Invocation and Argument Rules +2.1 Basic command line form: + skeleton_check []... [] +2.2 is required if any of the specified commands + require a project to analyze. +2.3 Commands are parsed from left to right as a list. The final argument + is interpreted as only if: + 2.3.1 At least one command that requires a project appears earlier in + the argument list, and + 2.3.2 There is at least one argument left after that command. +2.4 Dominating commands: + 2.4.1 If any of the following appear anywhere on the command line: + usage, help, version + then that command is executed and all other arguments are + ignored (including other commands and paths). +2.5 Commands that require : + 2.5.1 structure + 2.5.2 age + 2.5.3 import + 2.5.4 export + 2.5.5 suspicious + 2.5.6 addendum + 2.5.7 all (which expands to a sequence of project commands) +2.6 Commands that do not require a project: + 2.6.1 version + 2.6.2 help + 2.6.3 usage + 2.6.4 environment +2.7 Missing project argument: + 2.7.1 If the first command that requires a project is also the last + argument, there is no argument left to serve as + , and skeleton_check reports an error. + 2.7.2 If a command that requires a project appears before the last + argument, the last argument is interpreted as , even if + its spelling matches a command name. +2.8 Effect of “all”: + 2.8.1 The special command “all” is shorthand for: + environment, structure, age, import, export, suspicious, addendum + 2.8.2 “all” may not be combined with other commands. If present, it + must be the only non-dominating command on the line. + +3. Environment Expectations +3.1 Before running skeleton_check you are expected to: + 3.1.1 Be inside a Harmony-derived project. + 3.1.2 Have already run: + source env_toolsmith + which in turn sources: + tool_shared/bespoke/env + 3.1.3 Have $REPO_HOME set to your Harmony project root. +3.2 All skeleton paths are derived from: + $REPO_HOME +3.3 The tool does not modify any files. It only reports differences and + prints suggested copy commands for you to run (or edit) manually. + +4. Core Concepts +4.1 Harmony skeleton dictionary + 4.1.1 The Harmony tree (under $REPO_HOME) is traversed once to build + a dictionary mapping relative paths to node information + (NodeInfo: type, leaf flag, mtime, and optional checksum). + 4.1.2 This dictionary is the authoritative description of the skeleton. +4.2 dictionary + 4.2.1 The tree is traversed similarly, with its own GitIgnore + instance and built-in ignore filters. + 4.2.2 The dictionary is authoritative for what actually + contains, including paths that are between and below the skeleton. +4.3 Ignore handling + 4.3.1 A GitIgnore class holds a stack of discern functions that each + accept or ignore nodes based on their relative path and NodeInfo. + 4.3.2 The current implementation does not parse .gitignore patterns. + For non-root .gitignore files, the entire subtree under that + directory is ignored (except for the .gitignore itself). This is + a simplification with a TODO to replace it with proper parsing. +4.4 Leaf nodes + 4.4.1 Leaf nodes in the Harmony skeleton are paths that have no + accepted descendants under $REPO_HOME. + 4.4.2 When has extra content under a skeleton leaf directory, + that content is treated as addendum (project-local extensions). + +5. Commands (high-level) +5.1 version, help, usage, environment + 5.1.1 version + - Prints: + skeleton_diff version . + 5.1.2 help + - Shows this detailed documentation. + 5.1.3 usage + - Shows a short, to-the-point command summary and examples. + 5.1.4 environment + - Prints $REPO_HOME and related Harmony / REPO variables, plus + PATH and selected tool-related variables. +5.2 structure (requires ) + 5.2.1 Uses the skeleton and dictionaries to find: + - Paths that exist in Harmony but are missing in + ([MISSING] entries). + - Paths that exist in both Harmony and , but where the + path is ignored by its .gitignore / filters + ([IGNORED] entries). +5.3 age (requires ) + 5.3.1 Compares mtimes between Harmony and for paths that + exist in both: + - NEWER: mtime > Harmony mtime (import candidates). + - OLDER: mtime < Harmony mtime (export candidates). + 5.3.2 With checksum mode enabled, paths with equal mtime but different + content (different checksum) are reported as DIFFERENT. +5.4 import / export (require ) + 5.4.1 import + - Prints cp commands to copy newer paths from back into + Harmony, overwriting older skeleton files (git history keeps + old versions). + 5.4.2 export + - Prints cp commands to copy newer skeleton paths from Harmony + into , overwriting stale project files. +5.5 suspicious (requires ) + 5.5.1 Reports skeleton paths that: + - Are present in but hidden by ignore rules, and/or + - Have equal mtime but different content when checksum mode is + enabled. + 5.5.2 These are the paths that most need human inspection. +5.6 addendum (requires ) + 5.6.1 Reports project-local paths in : + - Any path under a skeleton leaf directory that does not exist + in the skeleton, and + - Any path that appears in but not in the skeleton + dictionary at all. + 5.6.2 These are candidates to remain project-specific or to be pulled + back into the skeleton. + +6. Example Workflows +6.1 Inspect a specific project’s drift + 6.1.1 From a Harmony project: + source env_toolsmith + skeleton_check all ../subu +6.2 Import improvements from a project + 6.2.1 Run: + skeleton_check import ../subu +6.3 Refresh a stale project from the skeleton + 6.3.1 Run: + skeleton_check export ../some_project +6.4 Quick documentation and environment checks + 6.4.1 Without a project: + skeleton_check usage + skeleton_check help + skeleton_check version + skeleton_check environment + +7. Safety and Limitations +7.1 No automatic writes + 7.1.1 skeleton_check never changes files itself. It only prints + commands and reports. +7.2 Time-based comparison + 7.2.1 “Newer” and “older” are based on filesystem modification times. + If clocks or timestamps are misleading, results may need manual + interpretation. +7.3 Ignore semantics + 7.3.1 The current .gitignore handling is intentionally simplified: + non-root .gitignore files cause their entire directory subtree + to be ignored. This will be replaced by real pattern parsing in + a future version. +""" + print(help_text.strip()) + return 0 diff --git a/tool_shared/bespoke/scratchpad b/tool_shared/bespoke/scratchpad index f14f140..aa7c35a 100755 --- a/tool_shared/bespoke/scratchpad +++ b/tool_shared/bespoke/scratchpad @@ -4,7 +4,7 @@ import os, sys, shutil, stat, pwd, grp, subprocess HELP = """usage: scratchpad {ls|clear|help|make|write|size|find|lock|unlock} [ARGS] - ls List scratchpad in an indented tree with perms and owner (quiet if missing). + ls| list List scratchpad in an indented tree with perms and owner (quiet if missing). clear Remove all contents of scratchpad/ except top-level .gitignore. clear NAME Remove scratchpad/NAME only. make [NAME] Ensure scratchpad/ exists with .gitignore; with NAME, mkdir scratchpad/NAME. @@ -198,7 +198,7 @@ def CLI(): if len(sys.argv) < 2: print(HELP); return cmd, *args = sys.argv[1:] - if cmd == "ls": + if cmd == "ls" or cmd =="list": if have_sp(): ls_tree(SP) else: return elif cmd == "clear":