Architecture Overview¶

ByzPy is organized into three cooperating tiers that work together to provide Byzantine-robust distributed learning:

Application Layer – user-facing APIs such as aggregators, attacks, pre-aggregators, parameter-server helpers, peer-to-peer training, and the honest/byzantine node classes.
Scheduling Layer – the computation-graph primitives (GraphInput, GraphNode, Operator, ComputationGraph) plus the NodeScheduler and ActorPool that orchestrate execution.
Actor Layer – lightweight worker backends (threads, processes, GPUs, remote TCP/UCX actors) that actually run tasks.

All higher-level concepts build on top of these tiers, so understanding their responsibilities makes it easier to extend the framework.

Architecture Diagram¶

┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                         │
│  Aggregators │ Attacks │ Pre-Aggregators │ PS/P2P Helpers  │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   Scheduling Layer                           │
│  ComputationGraph │ NodeScheduler │ ActorPool │ Operators   │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                      Actor Layer                             │
│  Thread │ Process │ GPU │ Remote (TCP/UCX) │ Channels      │
└─────────────────────────────────────────────────────────────┘

1. Application Layer¶

The application layer exposes the APIs most users interact with. This is where you define what computations happen, but not how or where they execute.

Aggregators¶

Aggregators (byzpy.aggregators.*) combine multiple gradient vectors into a single aggregated gradient. They are the core of Byzantine-robust learning:

Coordinate-wise aggregators: Median, Trimmed Mean, Mean of Medians
Geometric aggregators: Krum, Multi-Krum, Geometric Median, MDA, MoNNA, SMEA
Norm-wise aggregators: Center Clipping, CGE, CAF

Each aggregator can be dropped into a node pipeline and supports optional parallel execution via subtasks.

Attacks¶

Attacks (byzpy.attacks.*) simulate malicious behavior by generating adversarial gradients:

Empire: Scaled mean of honest gradients
Sign Flip: Negated gradients
Label Flip: Gradients computed with flipped labels
Little: Small random perturbations
Gaussian: Gaussian noise injection
Inf: Extreme gradient values
Mimic: Mimics honest gradients

Attacks are used in simulations to test aggregator robustness.

Pre-Aggregators¶

Pre-aggregators (byzpy.pre_aggregators.*) transform vectors before aggregation:

Bucketing: Groups vectors into buckets and averages within each
Clipping: Clips vectors to a maximum norm
ARC: Adaptive Robust Clustering
NNM: Nearest-Neighbor Mixing

Training Orchestrators¶

Parameter Server (byzpy.engine.parameter_server.ps.ParameterServer): Centralized training where a server aggregates gradients from all nodes
Peer-to-Peer (byzpy.engine.peer_to_peer.train.PeerToPeer): Decentralized training where nodes communicate directly with neighbors

Node Applications¶

HonestNodeApplication: Manages honest node pipelines (gradient computation, aggregation)
ByzantineNodeApplication: Manages Byzantine node pipelines (attack generation)

These components rely on the scheduling layer to wire their computation graphs, but they define the semantics of “what” gets computed.

2. Scheduling Layer (Engine Overview)¶

The scheduling layer combines graphs and actors to orchestrate execution. It decides “when” things run and how to parallelize them.

Graph Module¶

The graph module provides declarative computation graph primitives:

GraphInput / graph_input() – Typed handles that identify data the application must supply. These are placeholders in the graph that get filled with actual values at runtime.
GraphNode – A single operator plus its named inputs. Nodes can depend on graph inputs or other nodes’ outputs.
Operator – Reusable compute primitive. All aggregators, attacks, and pre-aggregators are operators. Operators can support optional subtasks for distributed execution.
ComputationGraph – Validates nodes, enforces topological order, tracks required inputs/outputs. Ensures the graph is a valid DAG.

Scheduling & Execution¶

ActorPool – Manages a pool of worker actors from one or more backend configurations. Workers can be threads, processes, GPUs, or remote actors. The pool routes tasks to appropriate workers based on capabilities.
NodeScheduler – Walks the DAG in topological order, hydrates inputs, and runs each operator. When operators support subtasks, the scheduler fans out work across the actor pool and fans in results.
MessageAwareNodeScheduler – Extended scheduler that supports message-driven execution, allowing nodes to wait for messages from other nodes before proceeding.

Execution Flow¶

Build a ComputationGraph with nodes and their dependencies
Create a NodeScheduler with an optional ActorPool
Call await scheduler.run({...}) with input values
The scheduler:
- Validates all required inputs are present
- Executes nodes in topological order
- Routes operators to the actor pool for parallel execution
- Returns the requested outputs

Example:

from byzpy.engine.graph.ops import make_single_operator_graph
from byzpy.engine.graph.scheduler import NodeScheduler
from byzpy.engine.graph.pool import ActorPool, ActorPoolConfig
from byzpy.aggregators.coordinate_wise.median import CoordinateWiseMedian

# Create graph
aggregator = CoordinateWiseMedian()
graph = make_single_operator_graph(
    node_name="agg",
    operator=aggregator,
    input_keys=("gradients",)
)

# Create pool and scheduler
pool = ActorPool([ActorPoolConfig(backend="thread", count=4)])
await pool.start()
scheduler = NodeScheduler(graph, pool=pool)

# Execute
results = await scheduler.run({"gradients": gradients})
aggregated = results["agg"]

3. Actor Layer¶

The actor subsystem (byzpy.engine.actor) manages workers on different backends. It is responsible for “where” work runs.

Actor Backends¶

Backend	Description	Typical Use	Communication
`ThreadActorBackend`	Dedicated thread per worker in the current process.	Low-latency CPU tasks, I/O-bound operations.	Shared memory
`ProcessActorBackend`	Separate Python process per worker.	CPU-heavy work needing true parallelism, avoiding GIL.	Multiprocessing pipes
`GPUActorBackend`	CUDA-aware workers for local GPU execution.	GPU-accelerated computations.	CUDA memory
`RemoteActorBackend`	TCP-based remote actors.	Scaling beyond one machine, distributed training.	TCP sockets
`UCXRemoteActorBackend`	UCX-based remote actors.	High-speed GPU clusters, RDMA communication.	UCX/InfiniBand

Supporting Infrastructure¶

ActorRef – Async proxy that turns attribute access into remote calls. Provides a transparent interface for calling methods on remote actors.
Channels – Mailbox-style queues for streaming data between actors. Supports point-to-point and broadcast communication patterns.
Capability routing – The ActorPool routes tasks to workers that satisfy declared capabilities (e.g., "gpu", "cpu"). Tasks can specify affinity hints to prefer certain workers.

Actor Pool Architecture¶

The ActorPool manages multiple workers and routes subtasks to them:

Workers are created lazily when the pool starts
Tasks are queued and dispatched to available workers
Workers can be from different backends (heterogeneous pools)
Task affinity hints route tasks to preferred workers
Channels enable communication between workers

The actor layer is responsible for “where” work runs, while the scheduling layer decides “when” it runs and the application layer defines “what” runs.

Message-Driven Execution¶

ByzPy supports message-driven execution for decentralized training. Nodes can:

Wait for messages from other nodes before proceeding
Broadcast messages to all neighbors
Send messages to specific nodes
Use topology-aware routing

This enables fully asynchronous peer-to-peer training where nodes progress independently based on received messages.

Decentralized Scheduling¶

Each node can run its own scheduler independently:

Nodes execute computation graphs based on local state
Messages trigger graph execution when dependencies are met
No central coordinator required
True parallelism across nodes

This architecture supports both centralized (parameter server) and decentralized (peer-to-peer) training patterns.