Large Spectrum Models (LSMs): Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data
Authors: Mohammad Mosiur Lunar, Mehmet C. Vuran
First: 2026-05-11T16:43:55+00:00 · Latest: 2026-05-11T16:43:55+00:00
Abstract
Dynamic spectrum access (DSA) has become a key pillar of next-generation wireless systems to address the spectrum scarcity due to the rapid growth of connected devices. Accurate short-term spectrum forecasting is critical for DSA, where data-driven approaches have proven most effective. Recent advances in and widespread adoption of large language model (LLM) architectures present new opportunities for spectrum prediction. In this paper, foundational large spectrum models (LSMs) are presented. A novel RF tokenizer is introduced to convert raw IQ measurements into token sequences by mapping each power-spectral density value to a fixed vocabulary along with embedding gain, frequency, FFT bin, and timestamp information. Five established open-source LLM architectures (Gemma-2B, GPT-2, LLaMA-7B, Mistral-7B, and Phi-1) are trained on this tokenized spectrum data for the task of spectrum forecasting, yielding LSMs. To leverage the scaling gains of LSMs, a fully automated outdoor wireless testbed is employed to collect over 22 TB of raw spectrum data across 33 sub-GHz frequency bands, yielding 8.4B tokens in total. Across all 33 bands, the best model (LSM-Mistral) achieves a root-mean-square error of 3.25 dB and 97% of predictions have a mean absolute error below 5 dB. Generalization of LSMs is illustrated by fine-tuning the models on data collected in different locations, where RMSE is maintained below 3.7 dB. These results demonstrate that widespread decoder-only transformer architectures can serve as effective predictive models for large-scale RF spectrum forecasting.
Summary / 总结
Dynamic spectrum access (DSA) has become a key pillar of next-generation wireless systems to address the spectrum scarcity due to the rapid growth of connected devices.
A Case for CATS: A Conductor-driven Asymmetric Transport Scheme for Semantic Prioritization
Authors: Syed Muhammad Aqdas Rizvi
Venue: 2025 6th International Conference on Innovative Computing (ICIC)
First: 2026-03-14T13:36:15+00:00 · Latest: 2026-05-11T12:00:36+00:00
Comments: Extended version. Contains additional mathematical formalization of the deadlock resolution constraint, detailed ns-3 simulation parameters, and further details on possible future work and extensions not present in the IEEE conference proceedings. 7 pages, 3 figures, 2 tables. Code available at https://github.com/smarizvi110/cats
Abstract
Standard transport protocols like TCP operate as a blind, FIFO conveyor belt for data, a model that is increasingly suboptimal for latency-sensitive and interactive applications. This paper challenges this model by introducing CATS (Conductor-driven Asymmetric Transport Scheme), a framework that provides TCP with the semantic awareness necessary to prioritize critical content. By centralizing scheduling intelligence in a transport-native "Conductor", CATS significantly improves user-perceived performance by delivering essential data first. This architecture directly confronts a cascade of historical performance workarounds and their limitations, including the high overhead of parallel connections in HTTP/1.1, the transport-layer Head-of-Line blocking in HTTP/2, and the observed implementation heterogeneity of prioritization in HTTP/3 over QUIC. Built upon TCP BBR, our ns-3 implementation demonstrates this principle by reducing the First Contentful Paint by over 78% in a representative webpage download configured as a deliberate worst-case scenario, with no penalty to total page load time compared to the baseline.
Summary / 总结
Standard transport protocols like TCP operate as a blind, FIFO conveyor belt for data, a model that is increasingly suboptimal for latency-sensitive and interactive applications.
Agentic Performance at the Edge: Insights from Benchmarking
Authors: Shiqiang Wang, Herbert Woisetschläger
First: 2026-05-11T11:24:20+00:00 · Latest: 2026-05-11T11:24:20+00:00
Comments: Accepted to AutoEdge workshop, co-located with MobiSys 2026
Abstract
Agentic artificial intelligence (AI) is a natural fit for Internet of Things (IoT) and edge systems, but edge deployments are often constrained to models around 8 billion parameters or smaller. An important question is: How much agentic-task quality is lost when model size is constrained by memory, power, and latency budgets? To address this question, in this paper, we provide an initial empirical study considering edge-focused model scaling, general-purpose versus coder-oriented model effects, and tool-enabled execution under a fixed protocol. We introduce a domain-conditioned evaluation methodology, an implementation-grounded analysis of model-tool interactions, practical guidance for model selection under constraints, and an analysis of failure modes that reveals distinct semantic versus execution failure patterns across model families. Our core finding is that edge-agent quality is not a simple function of parameter count. Robust deployment depends on the joint design of model choice and tool workflow. Domain-conditioned analysis reveals Pareto fronts in the accuracy-latency space that can guide strategy selection based on operational priorities.
Summary / 总结
Agentic artificial intelligence (AI) is a natural fit for Internet of Things (IoT) and edge systems, but edge deployments are often constrained to models around 8 billion parameters or smaller.
Is DRL-based MAC Ready for Underwater Acoustic Networks? Exploring Its Practicality in Real Field Experiments
Authors: Jiani Guo, Bingwen Huangfu, Shanshan Song, Nan Sun, Miao Pan, Guangjie Han
First: 2026-05-11T07:53:30+00:00 · Latest: 2026-05-11T07:53:30+00:00
Abstract
Medium Access Control (MAC) protocols rely on neighbor and environment information to design collision-free access rules for Underwater Acoustic Networks (UANs). Acquiring this information suffers from high communication overhead due to the unique underwater acoustic channel characteristics, such as long propagation delay, spatiotemporal variations in communication quality, and high attenuation. Deep Reinforcement Learning (DRL) is promising to circumvent the UANs' physical constraints and provide a low-overhead solution for underwater MAC protocols, since it can decide access rules based on real-time observation without extra information exchange. However, the unique underwater acoustic channel characteristics impose significant challenges on observation acquisition, training time, and the balance of multiple reward factors for DRL-based MAC protocols. Most existing methods remain at the theoretical level: (1) they design partial intelligent agents failing to achieve fully autonomous access; (2) they assume unreasonable simulation scenarios, weakening the effects of underwater acoustic channel characteristics on MAC protocols. To enhance the practicality of DRL-based MAC protocols, we first analyze the application challenges of DRL in UANs through real field experiments. Based on the above challenges, we propose a DRL-based MAC protocol that considers observation loss and balances multiple reward factors to achieve efficient Entire Autonomous access in the UAN (EA-MAC). To further explore the feasibility of DRL-based MAC protocols, we implement EA-MAC and other state-of-the-art protocols on underwater acoustic modems and evaluate their performance in real field experiments. Experimental results demonstrate that EA-MAC can adaptively determine the scheduling sequence for each node, enabling high-throughput and fair communication in a straightforward manner for UANs.
Summary / 总结
Medium Access Control (MAC) protocols rely on neighbor and environment information to design collision-free access rules for Underwater Acoustic Networks (UANs).
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
Authors: Zengzipeng Tang, Yuxuan Sun, Wei Chen, Jianwen Ding, Bo Ai
First: 2026-05-11T07:38:56+00:00 · Latest: 2026-05-11T07:38:56+00:00
Comments: This work has been submitted to the IEEE for possible publication
Abstract
The recent growth of on-device Large Language Model (LLM) inference has driven significant interest in device-edge collaborative LLM inference. As a promising architecture, Speculative Decoding (SD) is increasingly adopted where a lightweight draft model rapidly generates candidate tokens to be verified by a powerful target model. However, a fundamental challenge lies in achieving per-token resource scheduling to effectively adapt SD paradigm to resource-constrained edge environment. This paper proposes a Generative Entropy- and Lyapunov-based Adaptive Token Offloading framework, named GELATO, to maximize decoding throughput under energy constraints in a device-edge collaborative SD system. Specifically, an outer drift-plus-penalty loop makes online decisions to establish a reference drafting budget, managing long-term energy-throughput trade-off. Further, a nested entropy-driven generation mechanism executes early exiting to adapt to per-token dynamic generative uncertainty. Theoretical analysis establishes a rigorous performance bound on long-term throughput for GELATO. Extensive evaluations demonstrate that GELATO achieves a globally optimal tradeoff, outperforming state-of-the-art distributed SD architectures by 64.98% in token throughput and reducing energy consumption by 47.47% under resource-constrained environments, while preserving LLM decoding quality.
Summary / 总结
The recent growth of on-device Large Language Model (LLM) inference has driven significant interest in device-edge collaborative LLM inference.
Mixed-Criticality Flow Scheduling with Low Delay and Limited Bandwidth in TSN
Authors: Wenyan Yan, Sijing Duan, Dongsheng Wei
First: 2026-05-11T02:25:09+00:00 · Latest: 2026-05-11T02:25:09+00:00
Comments: 7 pages
Abstract
Time-Sensitive Networking (TSN) is a promising Ethernet protocol with time determinism, widely used in time-critical systems such as industrial automation, automotive networks, and avionics. By allocating dedicated time windows for time-sensitive flows, TSN enables deterministic transmission; however, as network traffic grows, multiple flows may contend for the same window, causing large delays. Frame aggregation can mitigate this by combining multiple small frames into a larger one, thereby reducing the number of frames and required time windows, but existing approaches typically handle only single-priority traffic and cannot fully utilize pre-allocated time windows. To address this limitation, we propose MCFS-2L, a mixed-criticality flow scheduling scheme with low delay and limited bandwidth usage. MCFS-2L first aggregates critical and non-critical frames with the same source and destination nodes and harmonic periods into a single frame, and then applies a dynamic reassembly and scheduling method that selectively disaggregates non-critical frames from unschedulable aggregated frames. Experimental results show that MCFS-2L increases the acceptance ratio of critical and non-critical flows by up to 4.78% and 8.58%, respectively, while reducing bandwidth utilization by up to 11.88%.
Summary / 总结
Time-Sensitive Networking (TSN) is a promising Ethernet protocol with time determinism, widely used in time-critical systems such as industrial automation, automotive networks, and avionics.
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
Authors: Hanchen Li, Runyuan He, Qiuyang Mang, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica
First: 2025-11-04T03:43:05+00:00 · Latest: 2026-05-11T02:12:30+00:00
Abstract
KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, which interleave LLM calls with tools, introducing pauses that prevent effective KV reuse across turns. Since many tool calls have much shorter durations than human response multi-turn chatbot, it would be promising to retain the KV cache in during these tools. However, many challenges remain. First, we need to consider both the potential cost of recomputation or reloading (if offloading enabled) as well as the increasing queueing delays after eviction from GPU. Second, due to the internal variance of tool call durations, the method needs to remain robust under limited predictability of tool call durations.
We present Continuum, a serving system to optimize job completion time for multi-turn agent workloads by introducing time-to-live mechanism for KV cache retention. For requests that generate tool calls, Continuum selectively pins the KV cache in GPU memory with a time-to-live value determined by the reload cost and potential queueing delay induced by eviction. When the TTL expires, the KV cache can be automatically evicted to free up GPU memory, providing robust performance under edge cases. When combined with program-level first-come-first-serve, Continuum preserves multi-turn continuity, and reduces delay for agentic workflows. Evaluations on real-world agents (SWE-Bench, BFCL, OpenHand) with Llama-3.1 8B/70B, Gemma-3 12B, and GLM-4.5 355B shows that Continuum improves the average job completion times by over 8x while improving throughput.
Summary / 总结
KV cache management is essential for efficient LLM inference.
TSNBench: Benchmarking LLM Proficiency in Time-Sensitive Networking
Authors: Rubi Debnath, Daniel Bujosa Mateu, Luxi Zhao, Silviu S. Craciunas, Paul Pop, Sebastian Steinhorst
First: 2026-05-10T11:25:41+00:00 · Latest: 2026-05-10T11:25:41+00:00
Abstract
We present TSNBench, the first benchmark for evaluating large language model (LLM) proficiency in Time-Sensitive Networking (TSN), a suite of IEEE 802.1 standards for deterministic communication with bounded latency in safety-critical domains such as autonomous vehicles, aviation, defense, and industrial automation. While LLMs have been extensively evaluated on general knowledge tasks, their capabilities in safety-critical networking domains remain largely unexplored. TSNBench comprises 939 expert-validated multiple-choice questions (MCQs) covering diverse TSN mechanisms, along with 100 open-ended Worst-Case Delay (WCD) computation tasks for Credit-Based Shaper (CBS) and Cyclic Queuing and Forwarding (CQF) across varying network topologies and traffic conditions. MCQ answers are validated by domain experts, and open-ended ground truth WCD values are computed using a verified Network Calculus (NC) solver for CBS and closed-form mathematical upper bounds for CQF. We evaluate 16 LLMs and find that although models achieve 67 to 95% accuracy on MCQs, they fail substantially on open-ended WCD computation. For CBS, only GPT-5 achieves a Mean Absolute Percentage Error (MAPE) of 36.2%, meaning its predicted WCD deviates by 36.2% of the actual TSN flow delay on average, while most models exceed 80%. For CQF, the best model achieves 41.8% MAPE, with most models clustering between 80% and 100%. Such errors are large relative to TSN latency budgets and can lead to violations of real-time constraints and unsafe configurations. TSNBench demonstrates that MCQ benchmarks may overestimate LLM capabilities in safety-critical networking domains.
Summary / 总结
We present TSNBench, the first benchmark for evaluating large language model (LLM) proficiency in Time-Sensitive Networking (TSN), a suite of IEEE 802.1 standards for deterministic communication with bounded latency in safety-critical domains such as autonomous vehicles, aviation, defense, and industrial automation.
PolicyCache-SDN: Hierarchical Intra-Path Learning for Adaptive SDN Traffic Control
Authors: Wenyang Jia, Jingjing Wang, Ziwei Yan, Tanren Liu, Yakun Ren, Kai Lei
First: 2026-05-10T11:01:02+00:00 · Latest: 2026-05-10T11:01:02+00:00
Abstract
Software defined networks offer global visibility, yet centralized control loops are too slow for transient congestion and bursty traffic dynamics. Existing learned traffic control schemes often rely on offline training, making them fragile under distribution shifts. We present PolicyCache-SDN, a hierarchical SDN traffic control framework that enables local online adaptation under centralized policy control. Its key abstraction is a policy envelope: the controller compiles network wide intent into bounded per path action spaces, while edge agents learn and execute metering, queueing, and rerouting decisions only within those bounds. Policy envelopes also make local actions auditable and reversible when they affect shared bottlenecks. Evaluation on a 1,024 host software SDN testbed shows that PolicyCache-SDN improves average core link utilization by 35.5% over Static ECMP and 18.3% over Centralized TE. It reduces elephant flow P99 FCT by 34.3% over end host congestion control, lowers SLA violations from 18.2% to 6.8%, and uses less than 2% CPU and 12 MB memory per edge agent.
The source code is available in an anonymized repository at https://anonymous.4open.science/r/JCC2026-PolicyCache-SDN/.
Summary / 总结
Software defined networks offer global visibility, yet centralized control loops are too slow for transient congestion and bursty traffic dynamics.
The Carrier Pigeon Internet Protocol: An Algorithmic (and Lighthearted) Perspective
Authors: Matthias Bentert, Shay Kutten, Darya Melnyk, Tijana Milentijevic, Stefan Schmid
First: 2026-05-10T09:10:29+00:00 · Latest: 2026-05-10T09:10:29+00:00
Abstract
The theoretical model behind the pigeon post as a link layer in a communication network was introduced by Shannon (under the guise of studying One-Time Pads for cryptography). That is, to send a one-hop message to $v$, a node $u$ needs a mail pigeon bred and raised at $v$. When sending a message using a pigeon to $v$, node $u$ loses the pigeon. To send another message to $v$, node $u$ needs another pigeon of $v$. It has been demonstrated that the communication bandwidth achievable with pigeon post can exceed that of networks using other media. This has already motivated the introduction of Internet standards that allow the use of pigeons as Internet link-layer media.
In this paper, we begin to fill in the missing piece: designing algorithms for breeding and scheduling pigeons to meet a given communication demand efficiently, minimizing the number of pigeons required. We consider singlehop, 2-hop, and multihop pigeon use. While the singlehop variant admits a simple characterization, both the 2-hop and the multihop variants are NP-hard. For the latter variants, we present a polynomial-time algorithm based on demand aggregation that achieves a 2-approximation for the number of pigeons used. We believe that this pigeon-based perspective offers both amusing and instructive insights into network design and hopefully, into ornithology.
Summary / 总结
The theoretical model behind the pigeon post as a link layer in a communication network was introduced by Shannon (under the guise of studying One-Time Pads for cryptography).
Chain-of-Thought Reasoning Enhances In-Context Learning for LLM-Based Mobile Traffic Prediction
Authors: MohammadMahdi Ghadaksaz, Mohammad Farzanullah, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci
First: 2026-05-10T02:11:52+00:00 · Latest: 2026-05-10T02:11:52+00:00
Abstract
Accurate short-term mobile traffic prediction is important for proactive resource allocation and low-latency network management in fifth generation (5G) and sixth generation (6G). While large language models (LLMs) can perform in-context learning (ICL) without task-specific retraining, naive ICL prompting may suffer from numerical instability and limited temporal reasoning when traffic dynamics fluctuate rapidly. In this paper, we propose a chain-of-thought (CoT)-enabled LLM-based mobile traffic prediction framework that operates in two phases: (i) an offline phase that constructs structured CoT demonstrations by generating rationales via a plan-based CoT (PCoT) pipeline (lecture, plan, and rationale), and (ii) an online phase that performs close to real-time prediction by retrieving the most relevant demonstrations using a similarity policy that considers both the historical throughput pattern and its short-term changes. We evaluate the proposed framework using a real-world 5G measurement dataset that includes both driving and static scenarios across diverse applications. Our numerical results reveal that the proposed 2-shot CoT-LLM can improve mean absolute error (MAE), root mean square error (RMSE) and R2-score by up to 14.88%, 15.03%, and 22.41%, respectively, compared to the 2-shot ICL-LLM and classical baselines. Furthermore, by optimizing the number of in-context examples, we achieve additional improvements of 4.58%, 5.70%, and 4.85% in MAE, RMSE, and R2-score, respectively.
Summary / 总结
Accurate short-term mobile traffic prediction is important for proactive resource allocation and low-latency network management in fifth generation (5G) and sixth generation (6G).
Where Do Flow Semantics Reside? A Protocol-Native Tabular Pretraining Paradigm for Encrypted Traffic Classification
Authors: Sizhe Huang, Zitong Li, Shujie Yang
First: 2026-03-09T15:15:23+00:00 · Latest: 2026-05-09T04:37:04+00:00
Abstract
Self-supervised masked modeling shows promise for encrypted traffic classification by masking and reconstructing raw bytes. Yet recent work reveals these methods fail to reduce reliance on labeled data despite costly pretraining: under frozen encoder evaluation, accuracy drops from greater than 0.9 to less than 0.47. We argue the root cause is inductive bias mismatch: flattening traffic into byte sequences destroys protocol-defined semantics. We identify three specific issues: 1) field unpredictability, random fields like ip.id are unlearnable yet treated as reconstruction targets; 2) embedding confusion, semantically distinct fields collapse into a unified embedding space; 3) metadata loss, capture-time metadata essential for temporal analysis is discarded. To address this, we propose a protocol-native paradigm that treats protocol-defined field semantics as architectural priors, reformulating the task to align with the data's intrinsic tabular modality rather than incrementally adapting sequence-based architectures. Instantiating this paradigm, we introduce FlowSem-MAE, a tabular masked autoencoder built on Flow Semantic Units (FSUs). It features predictability-guided filtering that focuses on learnable FSUs, FSU-specific embeddings to preserve field boundaries, and dual-axis attention to capture intra-packet and temporal patterns. FlowSem-MAE significantly outperforms state-of-the-art across datasets. With only half labeled data, it outperforms most existing methods trained on full data.
Summary / 总结
Self-supervised masked modeling shows promise for encrypted traffic classification by masking and reconstructing raw bytes.
Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs
Authors: Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong, Yichen Li, Falko Dressler, Ozgur B. Akan
First: 2026-05-08T16:24:54+00:00 · Latest: 2026-05-08T16:24:54+00:00
Abstract
Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM updates without sharing local raw data. However, FFT-based LLMs remain vulnerable to model manipulation threats, in which adversarial participants upload manipulated LLM updates that corrupt the aggregation process and degrade the performance of the global LLM. In this paper, we propose an Augmented Model maniPulation (AugMP) strategy against FFT-based LLMs. Specifically, we design a novel graph representation learning framework that captures feature correlations among benign LLM updates to guide the generation of malicious updates. To enhance manipulation effectiveness and stealthiness, we develop an iterative manipulation algorithm based on an augmented Lagrangian dual formulation. Through this formulation, malicious updates are optimized to embed adversarial objectives while preserving benign-like parameter characteristics. Experimental results across multiple LLM backbones demonstrate that the AugMP strategy achieves the strongest manipulation performance among all competing baselines, reducing the global LLM accuracy by up to 26% and degrading the average accuracy of local LLM agents by up to 22%. Meanwhile, AugMP maintains high statistical and geometric consistency with benign updates, enabling it to evade conventional distance- and similarity-based defense methods.
Summary / 总结
Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs).
Suitability of the Data Distribution Service for Next-Generation Ethernet-Based Agricultural Machinery Networking
Authors: Samuel Brodie, Henri Hornburg, Daniel Ostermeier, Maksim Pavlov, Timo Oksanen
First: 2026-05-08T13:47:26+00:00 · Latest: 2026-05-08T13:47:26+00:00
Abstract
The current state of the art in the agricultural industry for inter-manufacturer, plug-and-play communications is the ISO 11783 standard series, which mandates the use of 250 Kb/s CAN bus. To support higher data rates, the ISO 23870 series is under development, defining a gigabit automotive Ethernet physical layer for next-generation machine-to-machine communication networks. However, middleware is needed to handle the complexity of the system by providing an additional layer of abstraction. It should address the future needs of the industry such as higher levels of automation, additional data logging, modern data types, quality of service configuration, and best-practice cybersecurity. Data Distribution Service (DDS) is a potential middleware for use in such a network. DDS provides many features not present in the current ISO 11783, it is a standardised protocol for data sharing between distributed applications. This work analyses the extent to which DDS can be used to develop a system which meets the requirements for next-generation communication networking for agricultural machinery. A proof-of-concept design is presented, including a Task Controller and implement and it is shown that the requirements are fulfilled. A new DDI concept is proposed that decomposes the monolithic numeric DDI of ISO 11783 into separate typed Enums for handling group, handling feature, and SI units, enabling more flexible signal definitions. Four security configurations are tested in the proof-of-concept implementation and it is shown that enabling security features has a significant impact on throughput.
Summary / 总结
The current state of the art in the agricultural industry for inter-manufacturer, plug-and-play communications is the ISO 11783 standard series, which mandates the use of 250 Kb/s CAN bus.
Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN
Authors: Haiyuan Li, Yulei Wu, Dimitra Simeonidou
First: 2026-05-08T10:22:12+00:00 · Latest: 2026-05-08T10:22:12+00:00
Abstract
AI-RAN consolidates AI services and Radio Access Network (RAN) functions onto a unified, GPU-accelerated infrastructure at the network edge. However, compute sharing between real-time RAN functions and highly heterogeneous AI services requires coordination of scheduling decisions at mismatched timescales, and placement adaptation may require service migration across nodes with non-negligible interruptions. This paper proposes a hierarchical agentic framework (HAF) for compute sharing in AI-RAN that combines a large language model (LLM)-based agent for slow-timescale placement of AI services and RAN functions with a closed-form, deadline-aware convex algorithm for fast-timescale GPU/CPU allocation. The LLM agent is further equipped with a predictive critic that filters out migrations when the induced service interruption outweighs the expected service-level objective (SLO) benefit. Experimental results show that HAF reaches 90.0% overall SLO fulfillment, a 20.5% improvement over the strongest baseline, and raises AI service request fulfillment from 51% to 85.3%. Further evaluations show that HAF retains its advantage under diverse load conditions, while the critic consistently improves SLO fulfillment across multiple open-source LLM agents.
Summary / 总结
AI-RAN consolidates AI services and Radio Access Network (RAN) functions onto a unified, GPU-accelerated infrastructure at the network edge.
From Map-and-Encap to BIER: Observations on Network Routing Scalability
Authors: Tianyuan Yu, Lan Wang, Beichuan Zhang, Lixia Zhang
First: 2026-05-08T00:41:20+00:00 · Latest: 2026-05-08T00:41:20+00:00
Abstract
The TCP/IP protocol stack uses IP addresses for two distinct roles: identifying hosts and locating their attachment points in the network topology. This dual purpose creates a fundamental tension that has led to routing and forwarding scalability challenges throughout the history of the Internet in unicast packet delivery and, more notably, in multicast delivery. This paper reviews the evolution of routing scalability solutions over the years and makes four observations. First, map-and-encap is a recurring architectural solution shared by all scalable unicast and multicast delivery methods, developed independently across different problem contexts. Second, a new solution tends to succeed when it can bring immediate local gains to early adopters without requiring coordination across administrative domains. Third, network routing and forwarding designs that depend on external factors, such as the number of distinct end sites or even application-specific deliveries, inherently preclude an upper bound on their scalability. Fourth, today's inter-domain routing protocol, BGP, lacks a topological abstraction equivalent to an egress router within a routing domain, thereby inherently preventing a map-and-encap solution for scalability. These observations offer insights into the design of future scalable routing system architectures.
Summary / 总结
The TCP/IP protocol stack uses IP addresses for two distinct roles: identifying hosts and locating their attachment points in the network topology.
RNG: Flat Datacenter Networks at Scale
Authors: Giacomo Bernardi, Ratul Mahajan, C. Seshadhri, Enrico Carlesso, Chinchu Merine Joseph, Saurabh Kumar, Pavan Manikonda, Luiza Popa, Randy Ram, Steven Robinson, Elizabeth Tennent
First: 2026-04-16T17:37:04+00:00 · Latest: 2026-05-07T18:34:53+00:00
Abstract
We design and deploy in production the first flat datacenter networks. Our design, called RNG, is based on quasi-random graphs. While the cost and fault-tolerance benefits of such topologies have been long known, their practical realization has been hampered by a lack of scalable routing and cabling approaches. RNG has a new distributed routing protocol that exploits the properties of random graphs to find a large number of edge disjoint paths between pairs of endpoints. It uses a novel passive optical device that internally shuffles cables, which makes its cabling complexity similar to that of fat trees. We show that RNG matches or exceeds the performance of fat trees for a range of traffic patterns, despite being up to 45% cheaper. RNG is now the default datacenter network for most workloads at Amazon.
Summary / 总结
We design and deploy in production the first flat datacenter networks.
CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure
Authors: Eric Ding, Byungsoo Oh, Bhaskar Kataria, Kaiwen Guo, Jelena Gvero, Abhishek Vijaya Kumar, Arjun Devraj, Lindsey Bowen, Atharv Sonwane, Emaad Manzoor, Rachee Singh
First: 2026-05-07T16:40:39+00:00 · Latest: 2026-05-07T16:40:39+00:00
Abstract
Evaluative claims about LLM infrastructure -- ``workload X is fastest on hardware Y with software Z'' -- depend on a complex configuration space spanning hardware accelerators, interconnect bandwidth, software frameworks, parallelism plans, and communication libraries. Current infrastructure evaluation benchmarks publish a small set of end-to-end numbers that do not explain why one configuration outperforms another. We present CCL-Bench, a trace-based benchmark that addresses the limitations of existing benchmarks by recording reusable evidence for every ML workload. Each contributed data point in CCL-Bench packages an execution trace, a YAML workload card, and the launch scripts. We have developed a community-extensible toolkit to compute fine-grained compute, memory, and communication efficiency metrics from this evidence. Using CCL-Bench, we surface three claims that summary-statistic benchmarks cannot support: (i) higher compute-communication overlap can coincide with longer training step time and reveal inefficient parallelization choices, (ii) doubling TPU interconnect bandwidth yields a much higher end-to-end improvement in step time than doubling GPU interconnect bandwidth on small and medium workloads, and (iii) the best-tuned configuration on one training framework can run up to 3$\times$ slower than the best-tuned configuration on a peer framework on identical hardware.
Summary / 总结
Evaluative claims about LLM infrastructure -- ``workload X is fastest on hardware Y with software Z'' -- depend on a complex configuration space spanning hardware accelerators, interconnect bandwidth, software frameworks, parallelism plans, and communication libraries.
Delay-Robust Deep Reinforcement Learning for Ranging-Free Channel Access under Mobility in Underwater Acoustic Networks
Authors: Huaisheng Ye, Xiaowen Ye, Liqun Fu
First: 2026-05-07T16:36:24+00:00 · Latest: 2026-05-07T16:36:24+00:00
Comments: 6 pages, 7 figures, submitted to Globecom 2026
Abstract
Long propagation delays in underwater acoustic networks (UWANs) cause spatio-temporal uncertainty, constraining channel utilization in medium access control (MAC) protocols. Node mobility within autonomous underwater vehicle scenarios exacerbates these challenges by introducing dynamic propagation delays and varying spatial topologies. We present MobiU-MAC, a deep reinforcement learning (DRL)-based MAC protocol for mobile node access in UWANs that maximizes throughput via autonomous learning. MobiU-MAC incorporates CHILL-STER, a novel DRL algorithm optimized for UWANs that is both ranging-free and delay-robust. CHILL-STER employs a credit horizon-limited $λ$-return (CHILL-Return) mechanism to achieve stable learning under asynchronous delayed rewards, while the companion spatio-temporal experience replay (STER) mechanism addresses topological changes arising from node mobility. This work also demonstrates theoretically that DRL attains optimal policy learning equivalent to a standard Markov decision process under long propagation delays without requiring ranging. Performance evaluations indicate that MobiU-MAC outperforms existing DRL-based MAC protocols for UWANs by leveraging the maximum system delay boundary without ranging overhead, supporting the effectiveness of the proposed theory and algorithm in complex underwater dynamic environments.
Summary / 总结
Long propagation delays in underwater acoustic networks (UWANs) cause spatio-temporal uncertainty, constraining channel utilization in medium access control (MAC) protocols.
Designing Capacitated Subnetworks for Shortest Path Routing
Authors: Markus Chimani, Max Ilsen
First: 2026-05-07T14:19:08+00:00 · Latest: 2026-05-07T14:19:08+00:00
Abstract
In pursuit of higher energy efficiency in computer networks, one subfield of green traffic engineering aims at reducing the size of a network during times of low traffic, while still guaranteeing the ability to route all occurring demands. In this setting, we have to simultaneously solve a network design problem (choosing connections to deactivate) and a routing problem (routing paths in the active subnetwork, adhering to some routing protocol). Interestingly, there seems to be no available method to tackle the problem as a whole for the simplest (and still most commonly used) routing paradigm: shortest path routing. State-of-the-art methods either do not consider capacities, or assume that the routing paths should not change when deactivating network connections, or separate the problem into its two constituents, first solving the network design problem (using some estimators in lieu of the precise routing protocol) and only then the actual routing problem. In this paper, we present an algorithm to tackle the full combined problem exactly via a novel integer linear program, modeling dynamically changing shortest paths. To solve it, we need to devise a special-purpose column generation method. To speed up the solution process, we further propose additional provably strengthening constraints. Now having the means to yield true optimal solutions for (small) practical instances, we can for the first time give an in-depth experimental evaluation that includes the absolute quality intrinsic to the above simplifying algorithms. It turns out that the arguably simplest method--first computing a routing, fixing it, and turning off all superfluous connections--yields solutions surprisingly close to the true optimum in practice. When considering multiple different traffic demands, a recent traffic-oblivious approach (TOCA) performs best, while being comparatively straightforward to implement.
Summary / 总结
In pursuit of higher energy efficiency in computer networks, one subfield of green traffic engineering aims at reducing the size of a network during times of low traffic, while still guaranteeing the ability to route all occurring demands.
Binary Image-Based Intrusion Detection for Operational Technology Networks: Extending the SPHBI Methodology from IoT to Modbus TCP
Authors: Aamir Omar
First: 2026-05-05T19:34:45+00:00 · Latest: 2026-05-07T09:46:06+00:00
Comments: 14 pages, 5 figures, 5 tables. Preprint
Abstract
This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types). TCP/IP headers alone achieve only 51.8% binary accuracy, confirming that header-level heterogeneity exploited in IoT traffic is absent in uniform SCADA environments. Adding eight bytes of application-layer information improves binary accuracy to 98.1% with just 63 parameters, directly relevant to per-packet classification on resource-constrained OT edge devices. The best-performing approach achieves 94.4% +/- 2.2pp multiclass accuracy across nine classes (95% CI [92.9%, 95.9%], 10 seeds) with 56,873 parameters, roughly 430 times fewer than comparable ResNet50-based approaches. Per-class recall analysis shows seven of eight detectable attack types identified with recall above 94%, while replay attacks remain structurally undetectable by any single-packet method.
Summary / 总结
This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types).
SANEmerg: An Emergent Communication Framework for Semantic-aware Agentic AI Networking
Authors: Yong Xiao, Haoran Zhou, Yujie Zhou, Marwan Krunz
First: 2026-05-07T08:30:43+00:00 · Latest: 2026-05-07T08:30:43+00:00
Comments: Accepted at IEEE/IFIP WiOpt Workshop, Columbus, OH, USA, June 2026
Abstract
Future networking systems are envisioned to become part of an agentic AI-native ecosystem in which a vast number of heterogeneous and specialized AI agents cooperate seamlessly to fulfill complex user requirements in real time. However, traditional networking paradigms are characterized by a rigid decoupling of communication and computation, which often leads to significant inefficiencies in large-scale agentic AI networking (AgentNet) systems. Emergent communication offers a novel solution by enabling autonomous agents that support task-specific signaling protocols for information exchange and collaborative coordination. In this paper, we consider a multi-agent emergent communication framework, tailored for semantic-aware AgentNet systems in which the user's semantic intent can be automatically detected, inferred, and linked to a set of sub-tasks to be assigned to a set of agents. We investigate how communication and signaling protocols can emerge among collaborative agents with computationally bounded intelligence under stringent bandwidth constraints. Our proposed framework, called SANEmerg, is designed to facilitate the emergence of communication for collaborative task fulfillment while adhering to the physical limits of AgentNet. SANEmerg incorporates a bandwidth-adaptable importance-filter that dynamically prioritizes the transmission of higher-contribution message dimensions, ensuring robust performance in bandwidth-limited environments. Furthermore, SANEmerg integrates a complexity-regularizer grounded in the Minimum Description Length (MDL) principle to facilitate the emergence of computationally bounded signaling. Evaluated via an AgentNet prototype and extensive experimentation, SANEmerg demonstrates significant performance improvements over state-of-the-art solutions, achieving superior task accuracy while significantly reducing bandwidth and computational overhead.
Summary / 总结
Future networking systems are envisioned to become part of an agentic AI-native ecosystem in which a vast number of heterogeneous and specialized AI agents cooperate seamlessly to fulfill complex user requirements in real time.
Age of Gossip in Ring Networks With Non-Poisson Updates
Authors: Arunabh Srivastava, Sennur Ulukus
First: 2026-05-06T17:23:50+00:00 · Latest: 2026-05-06T17:23:50+00:00
Abstract
We consider a network consisting of $n$ nodes connected in a ring formation and a source that generates updates according to a renewal process and disseminates them to the ring network according to a Poisson process. The nodes in the network gossip with each other according to a push-based gossiping protocol, and disseminate version updates. Gossip between two neighbors happens at the arrivals of renewal processes with finite mean and variance. All renewal processes and Poisson processes in the network are independent but not identically distributed. We consider both uni-directional ring networks and bi-directional ring networks. We use version age of information to quantify the freshness of information at each node. Prior work has used the stochastic hybrid systems (SHS) approach or a first passage percolation (FPP) approach to analyze ring networks with edges following identical Poisson processes. In this work, we use a sample-path backtracking approach to characterize the probabilistic scaling of the version age of information of an arbitrary node in the gossip network, where each edge follows an independent but not identically distributed renewal process. We show that the version age of information of any node in the network is stochastically equivalent to $\sqrt{n}$ at any time instant after the node has received its first update from the source.
Summary / 总结
We consider a network consisting of $n$ nodes connected in a ring formation and a source that generates updates according to a renewal process and disseminates them to the ring network according to a Poisson process.
Optimizing Split Learning Latency in TinyML-Based IoT Systems
Authors: Zied Jenhani, Mounir Bensalem, Jasenka Dizdarević, Admela Jukan
First: 2025-07-22T13:50:12+00:00 · Latest: 2026-05-06T12:54:05+00:00
Comments: This paper is uploaded here for research community, thus it is for non-commercial purposes
Abstract
Split learning (SL) addresses the limitation of running deep learning inference directly on low-power edge/IoT nodes, in which it executes part of the inference process on the sensor and offloading the remainder to a companion device. Despite its promise, the inference latency of SL on constrained hardware under realistic low-power wireless protocols remains unexplored. This paper presents the first experimental latency benchmark of TinyML-based SL on ESP32-S3 boards, comparing four wireless communication protocol solutions (UDP, TCP, ESP-NOW, BLE). We also analyze the impact of the choice of different split points across different models (MobileNet-V2 and ResNet50) in terms of communication and computation overhead as a way to minimize the end-to-end inference latency. We propose a Beam Search-based algorithm for split point optimization that minimizes end-to-end latency, and compare it with other methods, including Greedy Search, First-Fit, Random-Fit, and Brute Force. ESP-NOW achieves the best RTT (3.6 s) and serves as the base protocol for the algorithm, which delivers near-optimal latency with processing time of 0.1 s for 5 devices.
Summary / 总结
Split learning (SL) addresses the limitation of running deep learning inference directly on low-power edge/IoT nodes, in which it executes part of the inference process on the sensor and offloading the remainder to a companion device.
AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing
Authors: Jiaying Meng, Xuewei Feng, Qi Li, Min Liu, Ke Xu
First: 2026-05-06T11:07:24+00:00 · Latest: 2026-05-06T11:07:24+00:00
Comments: 11 pages, 5 figures
Abstract
Industrial Control Protocols (ICPs) are critical to the reliability and stability of industrial infrastructure, yet their security is fundamentally compromised by a specification-blindness bottleneck. Modern fuzzers, constrained by observation-driven inference, struggle to penetrate deep protocol states or detect subtle semantic deviations. In this paper, we present AFL-ICP, an autonomous fuzzing framework that pioneers a specification-driven paradigm. AFL-ICP features a context-aware specification formalization pipeline to transform complex specifications into rigorous machine-executable grammars. Building on this formalized specification, AFL-ICP leverages LLMs to enable automated protocol adaptation and seed generation, allowing for rapid extension to new protocols with minimal manual effort. Additionally, it includes an LLM-powered differential checker that cross-references implementation outputs with specification requirements to detect subtle semantic and logic bugs that existing fuzzers cannot detect. We implement AFL-ICP and evaluate it on four widely used ICPs, including both open-source and closed-source variants. Results show that AFL-ICP significantly outperforms state-of-the-art fuzzers in coverage and uncovers 24 previously unknown vulnerabilities, for which we have received acknowledgments from affected vendors (e.g., FreyrSCADA). Specifically, the identified vulnerabilities include 16 semantic and logic bugs that can silently disrupt industrial operations and degrade service availability.
Summary / 总结
Industrial Control Protocols (ICPs) are critical to the reliability and stability of industrial infrastructure, yet their security is fundamentally compromised by a specification-blindness bottleneck.
Securing the Web with HSTS-Enforced
Authors: Aaron van Diepen, Adrian Zapletal, Fernando Kuipers
First: 2026-05-06T08:33:12+00:00 · Latest: 2026-05-06T08:33:12+00:00
Abstract
TLS stripping attacks expose sensitive web traffic by forcing secure HTTPS connections to fall back to unencrypted HTTP. At present, protection against these attacks relies on website operators explicitly opting into security by deploying mechanisms such as HTTP Strict Transport Security (HSTS) headers. These mechanisms have significant limitations: some are weak or difficult to configure, which raises the risk of misconfiguration and reduces practical adoption; others violate HTTP backward compatibility; at least one can even be abused to enable unintended user tracking.
We introduce HSTS-Enforced, a mechanism that eliminates the remaining attack surface for TLS stripping while still allowing operators to securely specify that their websites need to be accessed over HTTP when necessary, thereby maintaining accessibility. To achieve this, we flip the current opt-in security model to an opt-out model: all connections default to HTTPS, and operators can explicitly opt out if their websites require HTTP using so-called HTTP-Required indicators. We propose two such HTTP-Required indicators: a new DNS record and an HTTP-Required Preload list. We evaluate HSTS-Enforced under multiple deployment scenarios, demonstrating that it blocks all practical TLS stripping attempts while maintaining compatibility for sites that require HTTP - without introducing overhead in the typical case. Finally, we outline a practical transition path to accelerate global adoption.
Summary / 总结
TLS stripping attacks expose sensitive web traffic by forcing secure HTTPS connections to fall back to unencrypted HTTP.
SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting
Authors: Kuan-Hao Tseng, Niruth Bogahawatta, Yasod Ginige, Kosta Dekic, Arunan Sivanathan, Suranga Seneviratne
First: 2026-05-06T06:15:08+00:00 · Latest: 2026-05-06T06:15:08+00:00
Abstract
Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that encodes the classical Cisco troubleshooting methodology as an explicit policy. SADE pairs a phase-gated diagnostic workflow, which separates evidence acquisition from hypothesis commitment, with a routed library of fault-family skills and high-yield diagnostic helpers. On a held-out 523 incident set of the public NIKA benchmark covering eleven unseen scenarios, SADE improves root-cause F1 by 37 percentage points over a ReAct + GPT-5 baseline; a model-controlled comparison against the same Claude Sonnet backend without the SADE policy attributes 22 of those points to the diagnostic policy alone, showing that the gain is not a side-effect of the model upgrade.
Summary / 总结
Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds.
Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV
Authors: Maoxin Ji, Qiong Wu, Pingyi Fan, Cui Zhang, Nan Cheng, Wen Chen, Khaled B. Letaief
First: 2026-05-06T02:59:18+00:00 · Latest: 2026-05-06T02:59:18+00:00
Comments: This paper has been submitted to TMC
Abstract
This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory of each UAV, ensuring adaptive network coverage. Second, a novel hybrid resource scheduling paradigm synergizing Deep Reinforcement Learning (DRL) and Large Language Models (LLMs) is developed. Within this framework, the DRL agent dictates the initial resource allocation, while the LLM acts as a semantic macro-scheduler to rectify long-tail allocation imbalances for failed and surplus tasks. Crucially, a reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions, thereby ensuring policy convergence. Finally, the task offloading ratios are precisely determined via Linear Programming (LP) within an alternating optimization loop. Simulation results demonstrate that the proposed method significantly outperforms traditional multi-agent reinforcement learning baselines in terms of task success rate and system efficiency.
Summary / 总结
This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments.
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
Authors: Hongyu Hè, Minhao Jin, Maria Apostolaki
First: 2026-05-06T00:42:32+00:00 · Latest: 2026-05-06T00:42:32+00:00
Comments: 23 pages, 12 figures, 4 tables
Abstract
RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical.
We present ReGuard, a framework that discovers worst-case scenarios for a given RL controller and protects it against them at inference time without retraining. Discovery is formulated as a bilevel regret-maximization problem, which yields a certified lower bound on the worst-case performance gap. The discovered trajectories are then analyzed as counterfactuals and compiled into lightweight logic rules that intervene only when a risky state is detected, leaving the controller's behavior unchanged otherwise.
We evaluate ReGuard across three RL-based network controllers: Pensieve, Sage, and Park. ReGuard discovers scenarios in which the controller's performance is 43$-$64% worse than what is achievable. ReGuard not only discovers gaps 57% to 6$\times$ larger than those found by the strongest baselines but also shrinks them by 79$-$85% via lightweight rule-based protection while preserving nominal performance. ReGuard's protection extends beyond the scenarios it discovers, improving performance across a wider range of network conditions.
Summary / 总结
RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming.
Resilient AI Supercomputer Networking using MRC and SRv6
Authors: Joao Araujo, Alex Chow, Mark Handley, Ryder Lewis, Christoph Paasch, Jitendra Padhye, Michael Papamichael, Greg Steinbrecher, Amin Tootoonchian, Lihua Yuan, S. Anantharamu, Abhishek Dosi, Mohit Garg, Mahdieh Ghazi, Torsten Hoefler, Deepal Jayasinghe, Jithin Jose, Abdul Kabbani, Guohan Lu, Yang Wang, K. Doddapaneni, Murali Garimella, Vipin Jain, Yanfang Le, H. Nagulapalli, S. Narayanan, Rong Pan, Rathina Sabesan, Raghava Sivaramu, Rip Sohan, Eric Davis, Dragos Dumitrescu, Mohan Kalkunte, Bhaswar Mitra, Guglielmo Morandin, Adrian Popa, Costin Raiciu, Eric Spada, John Spillane, Niranjan Vaidya, Aviv Barnea, Idan Burstein, Elazar Cohen, Yamin Friedman, Noam Katz, Masoud Moshref, Yuval Shpigelman, Shahaf Shuler, Shy Shyman, Sayantan Sur
First: 2026-05-05T22:40:47+00:00 · Latest: 2026-05-05T22:40:47+00:00
Comments: 18 pages, 22 figures
Abstract
Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.
Summary / 总结
Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales.
Sequential vs. Simultaneous Entanglement Swapping under Optimal Link-Layer Control
Authors: Priyam Srivastava, Akshat R. Sabavat, Siddharth Jain, Alan Scheller-Wolf, Sridhar Tayur, David Tipper, Prashant Krishnamurthy, Amy Babay, Kaushik P. Seshadreesan
First: 2026-05-05T17:59:22+00:00 · Latest: 2026-05-05T17:59:22+00:00
Comments: Submitted to IEEE QCE 2026
Abstract
Connection-less, packet-switched quantum network architectures distribute entanglement across multi-hop paths through sequential entanglement swapping, in which each node acts on purely local state information. The architectural advantages over the connection-oriented alternative -- simultaneous SWAP-ASAP -- are compelling, but sequential swapping holds partial chains in intermediate buffers between successive swaps, exposing them to memory decoherence in a way simultaneous SWAP-ASAP avoids by design. We present a proof-of-principle study at fixed chain length $n = 4$ in which each elementary link is governed by a fixed reinforcement-learning policy optimizing the secret-key rate of the six-state protocol, leaving the network-layer protocol as the sole independent variable. Sweeping the network-layer memory coherence time $T_c^{\mathrm{ext}}$ over four orders of magnitude reveals a clear regime structure governed by the dimensionless ratio $T_c^{\mathrm{ext}}/τ$, where $τ$ is the per-link entanglement heralding latency. Simultaneous SWAP-ASAP delivers a constant rate across the full sweep. Sequential swapping, by contrast, collapses to zero end-to-end deliveries below $T_c^{\mathrm{ext}}/τ= 25$, and begins recovering at $T_c^{\mathrm{ext}}/τ= 50$. It remains limited by the simultaneous rate, which it saturates only at the relaxed end of the sweep. These results suggest that the connection-less penalty is a near-term phenomenon tied to present-day memory coherence rather than a fundamental property of sequential swapping.
Summary / 总结
Connection-less, packet-switched quantum network architectures distribute entanglement across multi-hop paths through sequential entanglement swapping, in which each node acts on purely local state information.
Surviving the Edge: Federated Learning under Networking and Resource Constraints
Authors: Mike Mwanje, Okemawo Obadofin, Theophilus Benson, Joao Barros
First: 2026-05-05T15:30:11+00:00 · Latest: 2026-05-05T15:30:11+00:00
Abstract
Motivated by the growing proliferation of federated learning (FL) in edge environments, we present the first systematic characterization of transport-layer breaking points in FL systems operating under conditions of highly constrained network and compute resources. Using a reproducible testbed with chaos engineering tools, we evaluate Flower under progressively degraded network conditions representative of resource-constrained deployments in Africa and similar environments. Our empirical investigation reveals a fundamental mismatch between FL's burst-idle communication pattern and standard TCP connection management. We identify precise operational boundaries: FL training catastrophically fails at 5-second one-way latency due to TCP handshake timeouts, above 50% packet loss due to buffer exhaustion, and with 90% client dropout rates. Through systematic analysis of connection patterns during training rounds, we demonstrate that FL's periodic model update bursts, separated by extended local training periods, violate the assumptions underlying default TCP configurations. To validate the significance of these findings, we show that adjusting just three TCP connection management parameters can significantly reduce training time under extreme latency, proving that transport-layer awareness is not merely beneficial but essential for FL deployment at the network edge. Our characterization methodology and findings provide practitioners with concrete thresholds for determining when standard FL deployments will fail and when advanced reliability techniques become necessary.
Summary / 总结
Motivated by the growing proliferation of federated learning (FL) in edge environments, we present the first systematic characterization of transport-layer breaking points in FL systems operating under conditions of highly constrained network and compute resources.
Architecture and protocols for all-photonic quantum repeaters
Authors: Naphan Benchasattabuse, Michal Hajdušek, Rodney Van Meter
Venue: 2024 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 1879-1889 (2024)
First: 2023-06-06T15:08:50+00:00 · Latest: 2026-05-05T15:27:57+00:00
Comments: Extended journal version of this paper can be found at https://doi.org/10.1109/TQE.2026.3653126 . 11 pages, 7 figures, (improve protocol details); comments welcome!
Abstract
The all-photonic quantum repeater scheme, utilizing a type of graph state called the repeater graph state (RGS), promises resilience to photon losses and operational errors, offering a fast Bell pair generation rate limited only by the RGS creation time (rather than enforced round-trip waits). While existing research has predominantly focused on RGS generation and secret key sharing rate analysis, there is a need to extend investigations to encompass broader applications, such as distributed computation and teleportation, the main tasks envisioned for the Quantum Internet. Here we propose a new emitter-photonic qubit building block and an RGS protocol that addresses several key considerations: end node involvement in connection establishment, decoding of logical qubits within the RGS, and computing the Pauli frame corrections at each participating node to ensure the desired correct end-to-end Bell pair state. Our proposed building block significantly reduces the total number of emissive quantum memories required for end nodes and seamlessly integrates all-photonic and memory-based repeaters under the same communication protocol. We also present an algorithm for decoding logical measurement results, employing graphical reasoning based on graph state manipulation rules.
Summary / 总结
The all-photonic quantum repeater scheme, utilizing a type of graph state called the repeater graph state (RGS), promises resilience to photon losses and operational errors, offering a fast Bell pair generation rate limited only by the RGS creation time (rather than enforced round-trip waits).
Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones
Authors: Andrea Iannoli, Lorenzo Gigli, Luca Sciullo, Angelo Trotta, Marco Di Felice
First: 2026-05-05T14:14:57+00:00 · Latest: 2026-05-05T14:14:57+00:00
Comments: 15 pages, 5 figures. This paper has been accepted for presentation at the 27th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM 2026)
Abstract
Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-loop execution. This paper presents a mission-agnostic, agent-enhanced LLM framework for UAV swarm control, where users express mission objectives in natural language and the system autonomously executes them through grounded, real-time interactions. The proposed architecture combines an LLM-based Agent Core with a Model Context Protocol (MCP) gateway and a Web-of-Drones abstraction based on W3C Web of Things (WoT) standards. By exposing drones, sensors, and services as standardized WoT Things, the framework enables structured tool-based interaction, continuous state observation, and safe actuation without relying on code generation. We evaluate the framework using ArduPilot-based simulation across four swarm missions and six state-of-the-art LLMs. Results show that, despite strong reasoning abilities, current general-purpose LLMs still struggle to achieve reliable execution - even for simple swarm tasks - when operating without explicit grounding and execution support. Task-specific planning tools and runtime guardrails substantially improve robustness, while token consumption alone is not indicative of execution quality or reliability.
Summary / 总结
Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-loop execution.
Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving
Authors: Zongze Li, Jingyu Liu, Zhen Xu, Yineng Zhang, Tahseen Rabbani, Ce Zhang
Venue: ICML 2026
First: 2026-03-09T06:11:23+00:00 · Latest: 2026-05-05T10:21:59+00:00
Comments: 19 pages, 11 figures. Accepted at ICML 2026
Abstract
Prefill-Decode (PD) disaggregation has become the standard architecture for modern LLM inference engines, which alleviates the interference of two distinctive workloads. With the growing demand for multi-turn interactions in chatbots and agentic systems, we re-examined PD in this case and found two fundamental inefficiencies: (1) every turn requires prefilling the new prompt and response from the last turn, and (2) repeated KV transfers between prefill and decode nodes saturate the bandwidth, leading to high latency and even service degradation. Our key insight is that not all prefill operations are equally disruptive: append-prefill, which processes only the new input tokens while reusing cached KV states, incurs an order-of-magnitude smaller decoding slowdown than full prefill. This motivates routing append-prefill to decode nodes locally. However, through comprehensive analysis, we show that no single fixed routing strategy satisfies all Service Level Objectives (SLOs) simultaneously. Based on this insight, we propose Prefill Prefill-capable Decode (PPD) disaggregation, a dynamic routing system that decides when to process Turn 2+ requests locally on decode nodes using cached KV states. PPD adapts to varying SLOs via configurable weights and seamlessly integrates with traditional PD deployments. With extensive evaluations, we show that PPD reduces Turn 2+ time-to-first-token (TTFT) by $\sim$68\% while maintaining competitive time-per-output-token (TPOT), effectively alleviating KV transfer congestion under high load. PPD provides a flexible and efficient paradigm for multi-turn LLM serving.
Summary / 总结
Prefill-Decode (PD) disaggregation has become the standard architecture for modern LLM inference engines, which alleviates the interference of two distinctive workloads.
Tiny-Twin: A CPU-Native Full-stack Digital Twin for NextG Cellular Networks
Authors: Ali Mamaghani, Ushasi Ghosh, Srinivas Shakkottai, Dinesh Bharadia, Ish Kumar Jain
First: 2026-01-13T04:48:03+00:00 · Latest: 2026-05-05T08:32:11+00:00
Abstract
Modern wireless applications demand testing environments that capture the full complexity of next-generation (NextG) cellular networks. While digital twins promise realistic emulation, existing solutions often compromise on physical-layer fidelity and scalability or depend on specialized hardware. We present Tiny-Twin, a CPU-Native, full-stack digital twin framework that enables realistic, repeatable 5G experimentation on commodity CPUs. Tiny-Twin integrates time-varying multi-tap convolution with a complete 5G protocol stack, supporting plug-and-play replay of diverse channel traces. Through a redesigned software architecture and system-level optimizations, Tiny-Twin supports fine-grained convolution entirely in software. With built-in real-time RIC integration and per User Equipment(UE) channel isolation, it facilitates rigorous testing of network algorithms and protocol designs. Our evaluation shows that Tiny-Twin scales to multiple concurrent UEs while preserving protocol timing and end-to-end behavior, delivering a practical middle ground between low-fidelity simulators and high-cost hardware emulators. We release Tiny-Twin as an open-source platform to enable accessible, high-fidelity experimentation for NextG cellular research.
Summary / 总结
Modern wireless applications demand testing environments that capture the full complexity of next-generation (NextG) cellular networks.
DACP: A Scientific Data Access and Collaboration Protocol
Authors: Zhihong Shen, Xiaojie Zhu, Zhenjing Cheng, Hao Ren, Zhaoji Liang, Changfa Lu
First: 2026-05-05T06:33:55+00:00 · Latest: 2026-05-05T06:33:55+00:00
Comments: 10 pages, 6 figures
Abstract
Scientific computing is rapidly entering a data-intensive era. However, existing general-purpose network protocol stacks face limitations in eliminating data silos and improving data accessibility and interoperability, making it difficult to effectively meet the demands of emerging paradigms such as AI4Science. To address these challenges, we propose the Data Access and Collaboration Protocol (DACP). DACP defines the Streaming Data Frame (SDF) as its core data model. Through Unified Resource Identification, columnar stream framing, and a reverse supply mechanism, DACP enables data discovery, in-situ computation, and the streaming return of results across scientific data centers, thereby facilitating efficient cross-domain collaboration. Furthermore, this paper introduces faird, a reference server implementation of DACP. This work provides a viable path for building scalable and collaborative scientific data infrastructures.
Summary / 总结
Scientific computing is rapidly entering a data-intensive era.
ARC: Consistent, Low-Latency Delivery via Receiver-Side Scheduling
Authors: Michael Luby
First: 2025-11-21T02:32:33+00:00 · Latest: 2026-05-05T00:54:03+00:00
Comments: 30 pages, 6 figures, 1 table
Abstract
Applications such as cloud gaming, video streaming, telemetry, ML inference, and data transfer provide a better experience when data is released at the receiver with timing reflecting how the data enters the sender. In practice, network delay variation and recovery dynamics at the receiver distort this timing even when transports deliver all packets correctly, producing visible jitter, stalls, and unstable playback. Many such applications operate best when delivery preserves this timing behavior and its implied order; out-of-order or irregular delivery can significantly degrade performance even when all data eventually arrives. We present a lightweight receiver-side release scheduling protocol, Adaptive Release Control (ARC), that restores this timing at the receiver. ARC releases recovered data in a manner that follows the sender's timing, maintaining ordering and limiting reordering when necessary while producing smooth delivery with minimal added latency given network conditions. It operates entirely on the receiver clock and requires no feedback, synchronization, or changes to the underlying transport. As an example, we integrate ARC into LT3, a network-layer system currently deployed as a software overlay that forwards traffic without altering the transport protocols it carries, where ARC functions as an independent module that regulates release timing for forwarded data. Evaluating LT3 with ARC on a cloud-gaming workload shows that the protocol removes virtually all large jitter excursions and yields release intervals that closely match the sender's timing, translating into improved perceptual smoothness. Broader latency improvements arise from the behavior of the full LT3 system. The benefits of ARC extend to transport protocols carried over LT3, including TCP, QUIC, WebRTC, UDP, and RTP, as preserving sender timing improves their behavior across a wide range of conditions.
Summary / 总结
Applications such as cloud gaming, video streaming, telemetry, ML inference, and data transfer provide a better experience when data is released at the receiver with timing reflecting how the data enters the sender.
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference
Authors: Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang
First: 2026-04-23T02:55:31+00:00 · Latest: 2026-05-04T22:59:12+00:00
Comments: Withdrawn by the authors due to an incorrect assumption in the model definition in Section 4, which affects the conclusions
Abstract
Efficient inference for on-device Large Language Models (LLMs) remains challenging due to limited hardware resources and the high cost of the prefill stage, which processes the full input context to construct Key-Value (KV) caches. We present SparKV, an adaptive KV loading framework that combines cloud-based KV streaming with on-device computation. SparKV models the cost of individual KV chunks and decides whether each chunk should be streamed or computed locally, while overlapping the two execution paths to reduce latency. To handle fluctuations in wireless connectivity and edge resource availability, SparKV further refines offline-generated schedules at runtime to rebalance communication and computation costs. Experiments across diverse datasets, LLMs, and edge devices show that SparKV reduces Time-to-First-Token by 1.3$x-5.1x with negligible impact on response quality, while lowering per-request energy consumption by 1.5x to 3.3x, demonstrating its robustness and practicality for real-world on-device deployment.
Summary / 总结
Efficient inference for on-device Large Language Models (LLMs) remains challenging due to limited hardware resources and the high cost of the prefill stage, which processes the full input context to construct Key-Value (KV) caches.
Degeneracy-Aware Functional and Algorithmic Resilience in Virtualized 6G Networks Under Correlated Failures
Authors: Mohamed Khalafalla Hassan, Indrakshi Dey
First: 2026-05-04T18:03:33+00:00 · Latest: 2026-05-04T18:03:33+00:00
Comments: 6 Pages, 7 figures
Abstract
Redundancy is widely used to sustain service continuity in programmable and virtualized networks; however, replicated functions often share platforms, software stacks, and control dependencies, making them vulnerable to correlated failures. Consequently, replica counts alone may overestimate true resilience. This paper adopts a degeneracy-aware perspective, where robustness depends on the availability of structurally diverse yet functionally equivalent alternatives. We formalize this perspective through three complementary metrics: the Functional Substitution Score (FSS), which quantifies structurally distinct substitutes for a function; the Algorithmic Resilience Quotient (ARQ), which measures diversity among algorithms that remain comparable in delivered performance; and the Multi-Layer Degeneracy Index (MLDI), which captures how functional diversity is distributed across architectural layers. Using targeted disruption protocols on a synthesized data, we show that redundancy and robustness can diverge substantially. The results show that FSS separates structural diversity from replica count, ARQ distinguishes genuine algorithmic alternatives from near-duplicate implementations, and MLDI captures cross-layer buffering that remains hidden under redundancy-only analysis. These findings establish degeneracy as a practical resilience primitive for open, disaggregated, and virtualized 6G systems.
Summary / 总结
Redundancy is widely used to sustain service continuity in programmable and virtualized networks; however, replicated functions often share platforms, software stacks, and control dependencies, making them vulnerable to correlated failures.
Tool Use as Action: Towards Agentic Control in Mobile Core Networks
Authors: Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An
First: 2026-05-04T16:49:24+00:00 · Latest: 2026-05-04T16:49:24+00:00
Abstract
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and autonomy through agents capable of planning, reasoning, and acting, while also opening up new business opportunities. Under this vision, existing network functions are expected to evolve into AI-enabled agents and tools that deliver both connectivity and beyond-connectivity services. As an initial attempt to move toward this vision, this paper presents a tool-based interface design and an experimental prototype that are based on agentic AI for the mobile core network, with the Model Context Protocol (MCP) and the Agent2Agent (A2A) protocol as foundational protocols. MCP is selected to design the interface between the agent and network tools, and the A2A protocol is used for message exchange between AI agents. In such an experimental setup, we analyze packet-level message flows between the agents, tools, and network functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent-based core network combined with network-specific tools can be utilized in next generation mobile systems to execute intent-based tasks.
Summary / 总结
Artificial Intelligence (AI) will play an essential role in 6G.
Beyond State Machines: Executing Network Procedures with Agentic Tool-Calling Sequences
Authors: Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An
First: 2026-05-04T13:34:20+00:00 · Latest: 2026-05-04T13:34:20+00:00
Abstract
Agentic AI will be an essential enabling technology for designing future mobile communication systems, which could provide flexible and customized services, automate complex network operations, and drive autonomous decision-making across the network. This work studies how Large Language Model (LLM)-based network AI agents can be utilized to execute network procedures expressed as sequences of tool invocations. We investigate four approaches, which differ in how the agent obtains the procedure and in how execution is distributed between the agent and the underlying tools. We evaluated the latency and execution correctness across these approaches using a User Equipment (UE) IP allocation procedure as a case study. Furthermore, we conduct a stress test to examine how many sequential procedural steps an LLM agent can reliably execute before failure. Our results show that approaches relying on iterative agent-side reasoning incur higher latency and are more prone to execution errors, while approaches where the procedure is encapsulated within a single tool, which internally orchestrates the required steps by invoking other tools, reduce latency by limiting repeated reasoning. The stress-test results further show that the model with advanced tool-calling capability maintains reliable execution over longer procedures than the other evaluated models; however, all models exhibit reliability degradation as procedure length increases, revealing clear execution limits in multi-step tool-based workflows. To systematically analyze failures in procedure execution, we introduce a procedure-specific error taxonomy that categorizes deviations in multi-step procedural execution.
Summary / 总结
Agentic AI will be an essential enabling technology for designing future mobile communication systems, which could provide flexible and customized services, automate complex network operations, and drive autonomous decision-making across the network.
IteRate: Autonomous AI Synthesis of In-Kernel eBPF Wi-Fi Rate Control Algorithms
Authors: James Lynch, Ziqian Liu, Snehadeep Gayen, Om Chabra, Hari Balakrishnan
First: 2026-05-04T12:45:54+00:00 · Latest: 2026-05-04T12:45:54+00:00
Abstract
Wi-Fi rate adaptation remains a persistent challenge in wireless networking. Deployed algorithms like Minstrel-HT have remained largely stagnant for over a decade, relying on hand-tuned heuristics that fail to generalize to the complexity of modern wireless environments. We present \name, an autonomous research system that closes the loop on rate control development. IteRate uses a multi-agent AI architecture to conduct the full scientific cycle: formulating hypotheses, writing eBPF programs that run inside the Linux kernel, deploying them over-the-air to Wi-Fi devices, collecting fine-grained telemetry for analysis, and iterating based on experimental evidence, all without human intervention. IteRate makes three contributions. (1) a novel kernel module that exposes per-frame hardware telemetry including modulation and coding schemes (MCS) and retry counts to eBPF programs, (2) a structured agentic AI architecture employing specialized agents for algorithm design, experiment execution, and data analysis, coordinated via a hypothesis-driven research protocol with persistent knowledge, and (3) a closed-loop pipeline that automates the cross-compilation, deployment, and evaluation of in-kernel logic onto embedded Wi-Fi targets.
On a 58-node testbed running five workloads. relative to the well-known Minstrel algorithm, IteRate achieves 21% faster web-page loads, 7% higher video quality of experience (QoE), and 21% higher peak throughput. Our work demonstrates that AI agents, when equipped with appropriate kernel-level hooks and a disciplined scientific workflow, can effectively automate the research required to design Wi-Fi rate controllers.
Summary / 总结
Wi-Fi rate adaptation remains a persistent challenge in wireless networking.
A Protocol-Independent Transport Architecture
Authors: Kimiya Mohammadtaheri, David Gao, Samuel Zhang, Matthew Chen, Eric Su, Pengyu Ji, Saad Syed, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo
First: 2026-05-04T04:20:04+00:00 · Latest: 2026-05-04T04:20:04+00:00
Abstract
The network transport layer is increasingly implemented in the NIC hardware to meet the performance demands of modern workloads, but this has made it difficult to evolve or deploy new transport protocols. Existing approaches either fix protocol logic in the data-path or build protocol-specific assumptions into the architecture that limit the range of protocols that can be supported on a single hardware substrate.
We present PITA, a protocol-independent transport architecture that enables full data-path programmability while sustaining line-rate performance. PITA eliminates protocol-specific assumptions by structuring the data-path around a uniform abstraction over events, state, and instructions, and rethinks core components, including scheduling, packet generation, and data reassembly, to operate on this abstraction. We evaluate PITA along key dimensions reflecting the goals of its protocol-agnostic datapath design. Specifically, we show that PITA supports diverse protocol semantics by showing it can implement TCP and \roce on the same data path and preserve their distinct end-to-end behavior. Through targeted microbenchmarks and synthesis on Alveo U250 cards, we show that PITA's redesigned components sustain high performance under demanding conditions, with modest hardware overhead and meeting timing at 250MHz.
Summary / 总结
The network transport layer is increasingly implemented in the NIC hardware to meet the performance demands of modern workloads, but this has made it difficult to evolve or deploy new transport protocols.
AdvNet: Revealing Performance Issues in Network Protocols by Generating Adversarial Environments
Authors: Shehab Sarar Ahmed, William Sentosa, Yinjie Zhang, Yoav Lebendiker, Michael Shnaiderman, Tomer Gilad, Nathan H. Jay, Brighten Godfrey, Michael Schapira
First: 2026-05-01T16:12:27+00:00 · Latest: 2026-05-04T02:55:17+00:00
Comments: 18 pages, 8 figures
Abstract
Infrastructure protocols like Congestion Control (CC) seek to provide reliable performance across a wide range of Internet environments. Currently, protocol designers assess performance through hand-designed test cases or data sets captured from real environments. However, such approaches may inadvertently overlook critical facets of the algorithm's behavior when they encounter an unanticipated environment or workload.
We seek to understand the unanticipated with AdvNet, a system that automatically generates adversarial network environments that cause a target protocol implementation to perform poorly. AdvNet employs machine learning-based optimization to generate environments, and incorporates a robust noise-handling mechanism to mitigate the variability inherent in real-world protocol performance. Although our approach is more general, this paper focuses specifically on transport protocols and their CC implementations. We showcase AdvNet's capability to create adversarial scenarios for 27 kernel-space implementations of both single-path and multi-path CC protocols, for several use cases with different performance goals. AdvNet identifies problematic network conditions that expose previously unnoticed Linux kernel bugs and uncovers hidden limitations in CC implementations, and provides insights about robustness. These results suggest that automated adversarial testing can be a valuable tool in protocol development, and that robustness is a useful new dimension for benchmarking CC protocols.
Summary / 总结
Infrastructure protocols like Congestion Control (CC) seek to provide reliable performance across a wide range of Internet environments.
DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training
Authors: Zechen Ma, Zixi Qu, Jinyan Yi, David Lin, Yashar Ganjali
First: 2026-05-03T17:47:53+00:00 · Latest: 2026-05-03T17:47:53+00:00
Abstract
Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models. While prior work has improved training efficiency from the ML perspective at the application layer, it often fails to address transient congestion events at the network layer that introduce severe tail latency and training-time variability, thereby undermining the quality of service (QoS) of distributed ML training systems. Existing network optimizations treat all gradients equally and thus fail to integrate sufficient model-training insights into communication protocol design.
In this paper, we present Dynamic Bounded-Loss Protocol (DBLP), a burst-resilient, training-phase-aware, and hardware-agnostic transport protocol that incorporates model-level tolerance properties into gradient communication. By dynamically adjusting gradient loss tolerance across training phases, DBLP reduces overall training time and mitigates tail-latency collapse during transient high-loss events (i.e., microbursts).
Compared to the current state-of-the-art solution (baseline), DBLP tolerates significantly higher loss while achieving comparable test accuracy, and reduces end-to-end training time by an average of 24.4% and a maximum of 33.9%. At microburst events, DBLP achieves up to 5.88x single-round communication latency speedups over the baseline, preventing burst-induced tail-latency spikes and maintaining stable training performance.
Summary / 总结
Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models.
6G Needs Agents: Toward Agentic AI-Native Networks for Autonomous Intelligence
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
First: 2026-05-02T17:24:12+00:00 · Latest: 2026-05-02T17:24:12+00:00
Abstract
Sixth-generation (6G) networks are increasingly envisioned as AI-native infrastructures integrating communication, sensing, and computing into a unified fabric. However, existing approaches remain largely optimization-centric, relying on closed-loop control with limited reasoning capability. In this paper, we argue for a paradigm shift toward Agentic AI-Native 6G, in which Large Language Model (LLM)-based agents operate as bounded, policy-governed reasoning entities within a semantic control plane layered above deterministic 3GPP infrastructure. We propose a four-layer architecture that integrates deterministic network infrastructure, semantic abstraction of intent and context, hierarchical reasoning, and a distributed multi-agent fabric spanning device, edge, and core domains. To assess feasibility, we develop a proof-of-concept agentic reasoning and orchestration framework and conduct an extensive empirical study using a domain-specific 6G benchmark under realistic deployment constraints. Our results reveal a fundamental tradeoff between reasoning capability and system efficiency, showing that no single model simultaneously satisfies latency, throughput, and accuracy requirements. Instead, heterogeneous deployment of LLM agents across the device--edge--core continuum is necessary to balance these constraints. We further demonstrate that quantization introduces non-uniform effects across models, reinforcing the need for system-level optimization rather than model-level compression alone. These findings establish agentic intelligence as a viable architectural direction for 6G and highlight key challenges in achieving scalable, trustworthy, and self-reasoning networks. All experimental results and evaluation scripts are publicly available to support reproducibility.
Summary / 总结
Sixth-generation (6G) networks are increasingly envisioned as AI-native infrastructures integrating communication, sensing, and computing into a unified fabric.
MORPH: Multi-Environment Orchestrated Reinforcement Learning for PRB Handling in O-RAN
Authors: Alireza Ebrahimi Dorcheh, Tolunay Seyfi, Ryan Barker, Fatemeh Afghah
First: 2026-05-01T21:55:50+00:00 · Latest: 2026-05-01T21:55:50+00:00
Abstract
Reinforcement-learning (RL) solutions for dynamic spectrum access and radio resource management in Open Radio Access Networks (O-RAN) depend critically on the fidelity of the throughput signal used for training. Analytical or physical-layer (PHY)-only simulators scale well but often miss protocol-stack effects such as signaling overhead and retransmissions, whereas exhaustive throughput profiling on a standards-compliant 5G stack is slow and can be unstable under software execution constraints. This paper presents MORPH, a measurement-grounded multi-environment RL pipeline {for slice-aware PRB-level spectrum allocation (spectrum sharing and slice isolation within a single gNB)} built on OpenAirInterface (OAI) 5G-NR RF-simulator mode. MORPH leverages three complementary throughput sources: (i) application-layer throughput measured via \texttt{iPerf} on the OAI stack under controlled AWGN pathloss settings, (ii) empirical MCS-selection distributions conditioned on path loss, enabling a distribution-aware theoretical throughput estimator that reflects standards-compliant link adaptation, and (iii) scalable throughput estimates from a 3GPP-parameterized PHY-fidelity OFDM simulator. Using these components, we train and compare agents that differ only in the origin of their throughput feedback: an OAI-grounded practical agent, a simulator-driven agent, and MORPH, which fuses real and synthetic throughput signals for policy optimization. Evaluation on the OAI execution harness across heterogeneous slicing scenarios shows that MORPH yields more robust slice-wise performance and improved SLA compliance than single-source training, providing a practical foundation for PRB-level spectrum sharing and slice isolation within a single-cell stack and a stepping stone toward multi-cell spectrum coordination and interference management.
Summary / 总结
Reinforcement-learning (RL) solutions for dynamic spectrum access and radio resource management in Open Radio Access Networks (O-RAN) depend critically on the fidelity of the throughput signal used for training.
TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests
Authors: Haarika Manda, Manshi Sagar, Yogesh, Kartikay Singh, Cindy Zhao, Tarun Mangla, Phillipa Gill, Elizabeth Belding, Arpit Gupta
First: 2025-10-24T04:25:16+00:00 · Latest: 2026-05-01T17:20:17+00:00
Abstract
Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of MB, and collectively, platforms like Ookla, M-Lab, and Fast.com generate petabytes of traffic each month. Reducing this burden requires deciding when a test can be stopped early without sacrificing accuracy. We frame this as an optimal stopping problem and show that existing heuristics-static thresholds, BBR pipe-full signals, or throughput stability rules from Fast.com and FastBTS-capture only a narrow slice of the achievable accuracy-savings trade-off. This paper introduces TurboTest, a systematic framework for speed test termination that sits atop existing platforms. The key idea is to decouple throughput prediction (Stage 1) from test termination (Stage 2): Stage 1 trains a regressor to estimate final throughput from partial measurements, while Stage 2 trains a classifier to decide when sufficient evidence has accumulated to stop. Leveraging richer transport-level features (RTT, retransmissions, congestion window) alongside throughput, TurboTest exposes a single tunable parameter epsilon for accuracy tolerance and includes a fallback mechanism for high-variability cases. Evaluation on 1 million M-Lab NDT speed tests (2024-2025) shows that TurboTest achieves 1.8-4.4x higher data savings than an approach based on BBR signals while reducing median error. These results demonstrate that adaptive ML-based termination can deliver accurate, efficient, and deployable speed tests at scale.
Summary / 总结
Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of MB, and collectively, platforms like Ookla, M-Lab, and Fast.com generate petabytes of traffic each month.
Inductive Latent Context Persistence: Closing the Post-Handover Cold Start in 6G Radio Access Networks
Authors: Anubhab Banerjee, Daniyal Amir Awan
Venue: ICML 2026
First: 2026-05-01T12:00:06+00:00 · Latest: 2026-05-01T12:00:06+00:00
Comments: submitted to 'AI for Next G' workshop, co-located with ICML 2026
Abstract
In modern radio access networks (RANs), rule-based handover (HO) decisions (e.g., A3/A5) depend on user equipment (UE) measurements only, so UEs at the same location can receive inconsistent HO outcomes. GNN-based methods improve HO KPIs using richer context than measurements alone. However, recurrent or graph models discard the per-UE recurrent state at HO and reinitialize at the target next-generation Node B (gNB), losing mobility history and forcing the target model to rebuild from post-HO measurements only. We address this post-HO cold start with Inductive Latent Context Persistence (ILCP), compressing the source recurrent state, transporting it on the 3GPP Xn as a 128-byte payload, and adapting it at the target gNB. We model the RAN as a dynamic heterogeneous graph over UE nodes, gNB nodes, measurement edges, and Xn edges. On a Vienna 4G/5G drive-test, ILCP achieves 0.0% ping-pong HOs versus 6.5% for an identical no-transfer baseline and 22.6% for a Transformer baseline; post-HO accuracy improves by +5.1 pp on average (peak +13.3 pp) in the 50-250 ms window. On one NVIDIA GTX 1080 (8 GB), ILCP runs end-to-end at 7.7 ms p99 per handover decision. Under perturbations (shadow fading, NLOS blockage, SSB-burst sparsity), robustly trained ILCP keeps handover failure (HOF) in the 10-13% range. Under the same fixed-reference-label setting, A3/A5 rises from 1.1% to 57-65% HOF when measurements are perturbed, exposing limits of measurement-only rules.
Summary / 总结
In modern radio access networks (RANs), rule-based handover (HO) decisions (e.g., A3/A5) depend on user equipment (UE) measurements only, so UEs at the same location can receive inconsistent HO outcomes.