# Architecture Overview ByzPy is organized into three cooperating tiers that work together to provide Byzantine-robust distributed learning: 1. **Application Layer** – user-facing APIs such as aggregators, attacks, pre-aggregators, parameter-server helpers, peer-to-peer training, and the honest/byzantine node classes. 2. **Scheduling Layer** – the computation-graph primitives (`GraphInput`, `GraphNode`, `Operator`, `ComputationGraph`) plus the `NodeScheduler` and `ActorPool` that orchestrate execution. 3. **Actor Layer** – lightweight worker backends (threads, processes, GPUs, remote TCP/UCX actors) that actually run tasks. All higher-level concepts build on top of these tiers, so understanding their responsibilities makes it easier to extend the framework. ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ Application Layer │ │ Aggregators │ Attacks │ Pre-Aggregators │ PS/P2P Helpers │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Scheduling Layer │ │ ComputationGraph │ NodeScheduler │ ActorPool │ Operators │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Actor Layer │ │ Thread │ Process │ GPU │ Remote (TCP/UCX) │ Channels │ └─────────────────────────────────────────────────────────────┘ ``` ## 1. Application Layer The application layer exposes the APIs most users interact with. This is where you define what computations happen, but not how or where they execute. ### Aggregators Aggregators (`byzpy.aggregators.*`) combine multiple gradient vectors into a single aggregated gradient. They are the core of Byzantine-robust learning: - **Coordinate-wise aggregators**: Median, Trimmed Mean, Mean of Medians - **Geometric aggregators**: Krum, Multi-Krum, Geometric Median, MDA, MoNNA, SMEA - **Norm-wise aggregators**: Center Clipping, CGE, CAF Each aggregator can be dropped into a node pipeline and supports optional parallel execution via subtasks. ### Attacks Attacks (`byzpy.attacks.*`) simulate malicious behavior by generating adversarial gradients: - **Empire**: Scaled mean of honest gradients - **Sign Flip**: Negated gradients - **Label Flip**: Gradients computed with flipped labels - **Little**: Small random perturbations - **Gaussian**: Gaussian noise injection - **Inf**: Extreme gradient values - **Mimic**: Mimics honest gradients Attacks are used in simulations to test aggregator robustness. ### Pre-Aggregators Pre-aggregators (`byzpy.pre_aggregators.*`) transform vectors before aggregation: - **Bucketing**: Groups vectors into buckets and averages within each - **Clipping**: Clips vectors to a maximum norm - **ARC**: Adaptive Robust Clustering - **NNM**: Nearest-Neighbor Mixing ### Training Orchestrators - **Parameter Server** (`byzpy.engine.parameter_server.ps.ParameterServer`): Centralized training where a server aggregates gradients from all nodes - **Peer-to-Peer** (`byzpy.engine.peer_to_peer.train.PeerToPeer`): Decentralized training where nodes communicate directly with neighbors ### Node Applications - **HonestNodeApplication**: Manages honest node pipelines (gradient computation, aggregation) - **ByzantineNodeApplication**: Manages Byzantine node pipelines (attack generation) These components rely on the scheduling layer to wire their computation graphs, but they define the semantics of "what" gets computed. ## 2. Scheduling Layer (Engine Overview) The scheduling layer combines graphs and actors to orchestrate execution. It decides "when" things run and how to parallelize them. ### Graph Module The graph module provides declarative computation graph primitives: - **`GraphInput` / `graph_input()`** – Typed handles that identify data the application must supply. These are placeholders in the graph that get filled with actual values at runtime. - **`GraphNode`** – A single operator plus its named inputs. Nodes can depend on graph inputs or other nodes' outputs. - **`Operator`** – Reusable compute primitive. All aggregators, attacks, and pre-aggregators are operators. Operators can support optional subtasks for distributed execution. - **`ComputationGraph`** – Validates nodes, enforces topological order, tracks required inputs/outputs. Ensures the graph is a valid DAG. ### Scheduling & Execution - **`ActorPool`** – Manages a pool of worker actors from one or more backend configurations. Workers can be threads, processes, GPUs, or remote actors. The pool routes tasks to appropriate workers based on capabilities. - **`NodeScheduler`** – Walks the DAG in topological order, hydrates inputs, and runs each operator. When operators support subtasks, the scheduler fans out work across the actor pool and fans in results. - **`MessageAwareNodeScheduler`** – Extended scheduler that supports message-driven execution, allowing nodes to wait for messages from other nodes before proceeding. ### Execution Flow 1. Build a `ComputationGraph` with nodes and their dependencies 2. Create a `NodeScheduler` with an optional `ActorPool` 3. Call `await scheduler.run({...})` with input values 4. The scheduler: - Validates all required inputs are present - Executes nodes in topological order - Routes operators to the actor pool for parallel execution - Returns the requested outputs Example: ```python from byzpy.engine.graph.ops import make_single_operator_graph from byzpy.engine.graph.scheduler import NodeScheduler from byzpy.engine.graph.pool import ActorPool, ActorPoolConfig from byzpy.aggregators.coordinate_wise.median import CoordinateWiseMedian # Create graph aggregator = CoordinateWiseMedian() graph = make_single_operator_graph( node_name="agg", operator=aggregator, input_keys=("gradients",) ) # Create pool and scheduler pool = ActorPool([ActorPoolConfig(backend="thread", count=4)]) await pool.start() scheduler = NodeScheduler(graph, pool=pool) # Execute results = await scheduler.run({"gradients": gradients}) aggregated = results["agg"] ``` ## 3. Actor Layer The actor subsystem (`byzpy.engine.actor`) manages workers on different backends. It is responsible for "where" work runs. ### Actor Backends | Backend | Description | Typical Use | Communication | |---------|-------------|-------------|---------------| | `ThreadActorBackend` | Dedicated thread per worker in the current process. | Low-latency CPU tasks, I/O-bound operations. | Shared memory | | `ProcessActorBackend` | Separate Python process per worker. | CPU-heavy work needing true parallelism, avoiding GIL. | Multiprocessing pipes | | `GPUActorBackend` | CUDA-aware workers for local GPU execution. | GPU-accelerated computations. | CUDA memory | | `RemoteActorBackend` | TCP-based remote actors. | Scaling beyond one machine, distributed training. | TCP sockets | | `UCXRemoteActorBackend` | UCX-based remote actors. | High-speed GPU clusters, RDMA communication. | UCX/InfiniBand | ### Supporting Infrastructure - **`ActorRef`** – Async proxy that turns attribute access into remote calls. Provides a transparent interface for calling methods on remote actors. - **Channels** – Mailbox-style queues for streaming data between actors. Supports point-to-point and broadcast communication patterns. - **Capability routing** – The `ActorPool` routes tasks to workers that satisfy declared capabilities (e.g., `"gpu"`, `"cpu"`). Tasks can specify affinity hints to prefer certain workers. ### Actor Pool Architecture The `ActorPool` manages multiple workers and routes subtasks to them: - Workers are created lazily when the pool starts - Tasks are queued and dispatched to available workers - Workers can be from different backends (heterogeneous pools) - Task affinity hints route tasks to preferred workers - Channels enable communication between workers The actor layer is responsible for "where" work runs, while the scheduling layer decides "when" it runs and the application layer defines "what" runs. ## Message-Driven Execution ByzPy supports message-driven execution for decentralized training. Nodes can: - Wait for messages from other nodes before proceeding - Broadcast messages to all neighbors - Send messages to specific nodes - Use topology-aware routing This enables fully asynchronous peer-to-peer training where nodes progress independently based on received messages. ## Decentralized Scheduling Each node can run its own scheduler independently: - Nodes execute computation graphs based on local state - Messages trigger graph execution when dependencies are met - No central coordinator required - True parallelism across nodes This architecture supports both centralized (parameter server) and decentralized (peer-to-peer) training patterns.