STEPS: Semantic Contract-Guided Scheduling for LLM-Assisted Natural Language-Driven Edge AI Services
Authors: Houyi Qi, Minghui Liwang, Xianbin Wang, Xinlei Yi, Seyyedali Hosseinalipour
First: 2026-06-08T14:19:47+00:00 · Latest: 2026-06-15T16:32:11+00:00
Abstract
Edge user/service scheduling has become a cornerstone of distributed AI systems, determining where and how AI services are executed under limited communication and computing resources. Existing edge scheduling frameworks usually assume that service requirements are given as numerical constraints, such as latency bounds or energy budgets. In practice, users often express service expectations through ambiguous and context-dependent natural language, creating a gap between user intent and scheduling decisions. To bridge this semantic-to-optimization gap, we propose semantic contract-guided edge potential scheduling (STEPS), a natural language-driven scheduling framework that introduces semantic contracts as executable interfaces between user-side semantics and edge-side decision making. In STEPS, a large language model (LLM)-assisted parser interprets natural language requests and extracts semantic service requirements with confidence scores, which are converted into service requirements and semantic uncertainty. Based on this information, STEPS formulates edge scheduling as a contract-guided potential game that jointly determines execution-node selection, computing-resource provisioning, and bandwidth allocation. STEPS further uses feedback signals to support adaptive scheduling under evolving service and network conditions. We characterize the exact potential game structure, establish the existence of a pure-strategy Nash equilibrium, and prove convergence and stability properties of the scheduling and adaptation processes. Extensive experiments show that STEPS improves semantic contract fulfillment, reduces contract-guided service loss, and maintains robust adaptation under ambiguous natural language requests in non-stationary networked AI environments.
Summary / 总结
Edge user/service scheduling has become a cornerstone of distributed AI systems, determining where and how AI services are executed under limited communication and computing resources.
Single-Connection Mixed-Criticality Transport with CATS: Bounded Guarantees, Three Structural Limits, and a QUIC Escape
Authors: Syed Muhammad Aqdas Rizvi
First: 2026-06-15T16:27:07+00:00 · Latest: 2026-06-15T16:27:07+00:00
Comments: 9 pages, 4 figures, 1 table
Abstract
Mixed-criticality applications, such as satellite terminals, industrial telemetry, embedded systems, tactical, and other constrained links, often multiplex a small, latency-critical message class and bulk traffic over a single commodity transport connection. A single FIFO connection can starve the critical class under load. The obvious alternative, opening parallel connections, costs an additional five-tuple (often blocked by carrier-grade NAT, port budgets, and operator policy) and is not always available; when the critical class is light, two connections can also be bandwidth-fair only in aggregate rather than single-flow fair.
We present CATS (Conductor-driven Asymmetric Transport Scheme), a sender-side, receiver-transparent transport-layer priority scheme over TCP: a Conductor assigns each message a priority class and just-in-time sequence numbers, using a credit-based shaper. CATS provides the one combination its alternatives cannot: deterministic non-starvation together with single-flow fairness, plus a provable bounded per-class delay. We then show that, crucially, CATS-over-TCP is not a tail-latency mechanism, and why. Three structural barriers bound in-band priority: the in-order sequence space (head-of-line blocking), the shared congestion window (cross-class coupling), and the per-flow granularity of network QoS (in-band priority is invisible to it). These barriers explain why fair-queuing and even the modern low-latency standard L4S cannot help a single connection, and why two parallel connections reduce the latency tail at the cost of an additional flow. We give CATS-over-QUIC as the principled escape: independent streams with per-stream isolation under aggregate-coupled congestion control self-isolate at the endpoint, attaining the guarantees on one fair flow. An ns-3 evaluation and QUIC proof-of-concept support the findings.
Summary / 总结
Mixed-criticality applications, such as satellite terminals, industrial telemetry, embedded systems, tactical, and other constrained links, often multiplex a small, latency-critical message class and bulk traffic over a single commodity transport connection.
Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts
Authors: Yijun Lu, Zihan Fang, Pengpeng Qiao, Zheng Lin, Jing Yang, Yuxin Zhang, Por Lip Yee, Zhe Chen, Jun Luo
First: 2026-06-14T06:22:09+00:00 · Latest: 2026-06-14T06:22:09+00:00
Comments: 6 pages, 4 figures
Abstract
The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL) emerges as the paradigm for privacy-preserving collaborative optimization, integrating MoE into FL under data heterogeneity may trigger conflicting expert optimizations. Client-specific data distributions force same-indexed experts to optimize under inconsistent or even conflicting feature-label correlations. This mismatch induces destructive interference during aggregation, thus destabilizing the optimization trajectory and degrading model performance. To address this issue, we propose FC-MoE, a federated conflict-aware framework for MoE fine-tuning. It employs an importance aware weighting scheme to prioritize reliable local updates and utilizes gradient consensus projection to suppress conflicting updates, ensuring a stable global optimization path. Moreover, a local knowledge retention mechanism further preserves specialized client expertise by re-anchoring domain-specific residuals. Extensive experiments demonstrate that FC-MoE accelerates convergence and enhances both global and local model performance in non-IID federated environments.
Summary / 总结
The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation.
From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability
Authors: Bijaya Dangol
First: 2026-06-05T11:07:55+00:00 · Latest: 2026-06-13T10:44:52+00:00
Comments: 18 pages, 6 figures, 6 tables
Abstract
Agent-interoperability protocols such as A2A and MCP standardize what agents say to one another but assume address-based transport. Whether over HTTP(S) or a content-protecting binding such as MLS-based SLIM, these transports protect message content yet leave the communication graph exposed: which agent contacts which, when, and how often. In agent systems this graph is more consequential than a privacy framing suggests. Endpoints are capability-labeled, workflows are structured and chained, and interactions are coupled to real actions, so an observer recovers more than past relationships: it can infer the pending workflow and, at machine speed, act on that inference before the workflow completes. The threat is therefore one of workflow integrity, not privacy alone. We formalize a threat model for the communication graph and locate what makes its metadata distinctively consequential: not stronger fingerprinting, which we measure to be comparable to other machine traffic, but exposure across independent trust domains, coupled to autonomous action. We define transport- and bootstrap-layer privacy properties, evaluate candidate transports, and give an A2A case study where a metadata-protecting binding surfaces the protocol's implicit identity assumptions. On a generative model anchored to a real capture and over a live A2A binding, a label-blind classifier recovers a task's class from passive metadata well above chance, and from only its opening; a defense-aware adversary does not overturn this, and only the full set of properties drives recovery toward chance. The leverage of acting on the leak is distinct from recoverability: under a fixed budget an adversary realizes most of a clairvoyant attacker's advantage from a workflow's opening, governed by precision over the top-ranked workflows rather than overall accuracy, so a defense suppresses it even while recovery stays above chance.
Summary / 总结
Agent-interoperability protocols such as A2A and MCP standardize what agents say to one another but assume address-based transport.
Solyx AI Grid: Hardware-Telemetry-Aware Routing Across Geographically Distributed GPU Clusters
Authors: Aleks Bernhard, Nithin Katla
First: 2026-06-13T01:40:29+00:00 · Latest: 2026-06-13T01:40:29+00:00
Comments: 15 pages, 9 tables
Abstract
As GPU capacity fragments across geographically distributed sites, single-cluster LLM inference routing assumptions break down in measurable ways. We present Solyx AI Grid, a cross-site inference routing control plane that integrates GPU hardware telemetry (DCGM), vLLM application metrics, and real-time WAN signals (RTT, jitter) into per-request placement decisions via a 10-signal weighted pressure scorer. Across two empirical campaigns--six H100/H200 SXM GPUs and nine RTX PRO 6000 Blackwell SE GPUs spanning three US datacenters, eight workload classes, and a 216-cell SLO matrix--Solyx AI Grid delivers 1.56--1.75x throughput at tier-2 SLO over round-robin across all eight classes, cuts capability-mismatch leakage to 0.43% (versus 32% for standard routers), and reroutes around failures at a p99 of 1,247 ms versus 4,226 ms. We further find that GPU hardware telemetry leads application-layer SLO breach by 11.2 seconds on average, enabling proactive traffic drain before user-facing latency impact. To our knowledge, this is the first public empirical study of live physical multi-site LLM inference routing combining hardware telemetry, application metrics, and active WAN path signals.
Summary / 总结
As GPU capacity fragments across geographically distributed sites, single-cluster LLM inference routing assumptions break down in measurable ways.
StreamRTPS: Increasing DDS Bandwidth Efficiency by Reducing Protocol Overhead
Authors: David Philipp Klüner, Stefan Kowalewski, Alexandru Kampmann
First: 2026-06-12T07:51:41+00:00 · Latest: 2026-06-12T07:51:41+00:00
Comments: 8 pages, 8 figures
Abstract
In this paper, we propose three extensions to the Real-Time Publish Subscribe wire protocol, on which Data Distribution Service (DDS) is based, to improve bandwidth efficiency. First, a stream negotiation mechanism exchanges static header information during discovery, replacing the full RTPS header at runtime with a compact 2 B identifier. Second, a payload aggregation scheme aggregates samples for the same locator into single UDP packets, reducing IP and UDP header costs. Third, a predictive heartbeat suppression strategy reduces control traffic by omitting heartbeats for periodic communication patterns, falling back upon detected loss or timing violations. All three mechanisms preserve Real-Time Publish Subscribe(RTPS) compatibility by extending DDS discovery to activate these features when supported. Experimental results show that stream headers reduce bandwidth consumption by up to 27.9 % compared to conventional RTPS under best-effort transport, and that heartbeat suppression yields a further 22.7 % reduction on top of stream headers under reliable transport, while preserving transmission latency in both cases.
Summary / 总结
In this paper, we propose three extensions to the Real-Time Publish Subscribe wire protocol, on which Data Distribution Service (DDS) is based, to improve bandwidth efficiency.
The Internet Runs on Names
Authors: Geoff Huston, Lixia Zhang
First: 2026-05-15T06:01:30+00:00 · Latest: 2026-06-12T02:53:07+00:00
Abstract
The Internet's TCP/IP architecture was designed for resilient packet delivery between hosts identified by IP addresses. Over time, however, the consolidation of applications and services into large-scale platforms built on that universal packet-delivery substrate drove deployment practices that fundamentally changed the Internet's operational model: the network now operates primarily on names. DNS names have become the basis for service identity, reachability, load balancing, and trust, while IP addresses have become ephemeral routing locators. This change was driven by application needs and platform consolidation in the absence of any overarching plan. The resulting mismatch between the original address-based design and the current name-based operation leads to serious consequences: operational complexity that grows with each new layer of indirection, fragility, and vulnerability - as seen in recent high-profile outages. This paper exposes this mismatch as a necessary first step toward understanding its consequences and addressing the risks of continuing on the same path.
Summary / 总结
The Internet's TCP/IP architecture was designed for resilient packet delivery between hosts identified by IP addresses.
Defending the Core: A Centrality-Based Protection Strategy for Supply Chain Security in npm Dependency Network
Authors: Zixin Wang
First: 2026-06-12T02:22:44+00:00 · Latest: 2026-06-12T02:22:44+00:00
Abstract
The modern software supply chain, taking Node Package Manager (npm) dependency network for example, relies heavily on shared open-source dependencies. While this promotes rapid development, it introduces systemic vulnerabilities as well. Concerning this potential risk, we analyze the npm dependency network by modeling 53,481 packages and 78,520 dependency edges, and classify the network as a scale-free topology. Thus, we demonstrate its inherent vulnerability to targeted attacks on high-degree hubs. To mitigate this, we propose and evaluate a dual-pronged defense strategy consisting of Centrality-Based Node-Hardening and Dependency Weight Warning system. Moreover, by simulating the network under various attack scenarios, we prove that applying strict security protocols to just the top 1% of nodes, combined with pruning 30% of structurally trivial edges, prevents catastrophic network collapse and neutralizes cascading malware infections. The source code can be found at https://github.com/5tarWhee1/Centrality-Based-Protection-Strategy-for-Supply-Chain-Security-in-npm-Dependency-Network.
Summary / 总结
The modern software supply chain, taking Node Package Manager (npm) dependency network for example, relies heavily on shared open-source dependencies.
The Bilateral Efficiency of Ethernet: Recalibrating Metcalfe and Boggs After Fifty Years
Authors: Paul Borrill
First: 2026-03-19T18:57:48+00:00 · Latest: 2026-06-11T23:39:08+00:00
Comments: 15 pages, including an appendix on the Open Aethernet fault model. 50th anniversary of Metcalfe and Boggs (1976). v2: renamed Open Aethernet; quantum/TSVF framing removed; added bilateral-zigzag history, intra-rack scope (<= 1 m), and the fault-model appendix
Abstract
In July 1976, Metcalfe and Boggs published their foundational paper on Ethernet in Communications of the ACM. Their efficiency model -- E = (P/C)/(P/C + W*T) -- measures the fraction of Ether time carrying good forward packets under contention. For fifty years this model has framed how the community thinks about Ethernet performance. We argue it is silent on the question that matters for modern intra-rack interconnect: bilateral transaction efficiency -- the fraction of link time that produces committed agreements between sender and receiver. Metcalfe and Boggs themselves planted the seed in their EFTP "end-dally" protocol (Section 7.2.2), and the deeper anchor is older still: Abramson's Alohanet carried positive acknowledgments at the link layer -- a bilateral mechanism Metcalfe consciously removed in 1973 to obtain Ethernet's simple, ACK-free packet format. The result is a fifty-year bilateral zigzag: Aloha (bilateral) to Ethernet (unilateral) to the EFTP end-dally (bilateral) to TCP (unilateral-with-bilateral-above). We formalize bilateral efficiency, connect it to the back-to-back Shannon channel with Perfect Information Feedback, and -- scoping the claim explicitly to intra-rack distances of one meter or less -- describe how the Open Aethernet link recovers mutual knowledge at the link layer. The correction to Table 1 is not a different set of numbers. It is a different question.
Summary / 总结
In July 1976, Metcalfe and Boggs published their foundational paper on Ethernet in Communications of the ACM.
A Tutorial on IEEE 802.11bn Multi-AP Coordination for Wi-Fi 8: From Standardization to Performance Evaluation
Authors: Francesc Wilhelmi, Boris Bellalta, Giovanni Geraci, Lorenzo Galati-Giordano, Francesca Meneghello, Aleksandra Kijanka, Iñaki Val, David López-Pérez
First: 2026-06-11T16:44:37+00:00 · Latest: 2026-06-11T16:44:37+00:00
Abstract
The IEEE 802.11bn amendment defines significant modifications to the standard by establishing Ultra High Reliability (UHR) targets in Wireless Local Area Networks (WLANs). This is expected to deliver substantial enhancements over previous standards, including modes of operation that increase throughput, reduce the 95th percentile of the latency distribution, and decrease MAC Protocol Data Unit (MPDU) loss (all by at least 25%) compared to Extremely High Throughput (EHT) operations defined in the 802.11be amendment. A fundamental innovation for achieving these ambitious goals is the introduction of Multi-Access Point Coordination (MAPC), an unprecedented feature whereby APs will be able to coordinate among themselves to enhance spectrum utilization and advance towards reliability. This paper provides a comprehensive overview and analysis of this key framework. We begin by reviewing existing AP coordination solutions that precede the 802.11bn standard, which serve as a foundation for understanding the transition to the current framework. We then describe the technical 802.11bn MAPC framework as defined by the task group. A detailed overview of each candidate MAPC feature is provided, contextualized with the relevant state-of-the-art. Furthermore, we introduce Kom8ndor, an open-source Wi-Fi 8 simulation tool, to evaluate these candidate MAPC features and showcase their potential to achieve UHR goals. Finally, we outline the future of MAPC beyond 802.11bn, exploring promising directions such as coordination schemes beyond 802.11bn (e.g., Joint Transmission (JT)) and new ideas.
Summary / 总结
The IEEE 802.11bn amendment defines significant modifications to the standard by establishing Ultra High Reliability (UHR) targets in Wireless Local Area Networks (WLANs).
Measurement-Based Performance Evaluation of SmartRSUs with Heterogeneous Antenna Architectures for V2X Communications
Authors: Marco Savarese, Gaetano Orazio Cauchi, Salvatore Iandolo, Antonio Solida Martin Klapez, Maurizio Casoni, Micaela Verucchi, Enrico Vincenzi, Ignacio Sanudo Olmedo, Marko Bertogna, Carlo Augusto Grazia
First: 2026-06-11T13:26:05+00:00 · Latest: 2026-06-11T13:26:05+00:00
Comments: Accepted for publication at the 2026 IEEE International Workshop on Metrology for Automotive (MetroAutomotive 2026)
Abstract
This paper presents a measurement-based performance evaluation of two custom Smart Roadside Units (SmartRSUs) featuring different V2X antenna architectures. The first configuration integrates GNSS and communication antennas into an all-in-one rooftop module, whereas the second uses external dual ITS-G5 (IEEE 802.11p) antennas operating at 5.9~GHz and a dedicated GNSS antenna. Both systems are built upon a proprietary On-Board Unit (OBU) platform adapted for infrastructure deployment.
The experimental campaign evaluates key V2X communication metrics, including coverage, received signal strength indicator (RSSI), packet loss, and end-to-end latency in both transmission (OBU-to-infrastructure) and reception (infrastructure-to-OBU) directions. To ensure objective validation, a commercial off-the-shelf V2X Roadside Unit is co-located on the same infrastructure and used as a performance benchmark, providing ground-truth reference measurements under identical environmental conditions through a controlled co-located deployment.
Results highlight the impact of antenna design and placement on communication reliability and latency, revealing trade-offs between integrated and external antenna configurations in real-world deployment scenarios. The findings provide practical insights for the design and optimization of next-generation SmartRSUs in cooperative intelligent transportation systems (C-ITS).
Summary / 总结
This paper presents a measurement-based performance evaluation of two custom Smart Roadside Units (SmartRSUs) featuring different V2X antenna architectures.
Feasibility Assessment of Remote Driving via Latency Analysis of ITS-G5 and Cellular Networks in the MASA Living Lab
Authors: Gaetano Orazio Cauchi, Antonio Solida, Salvatore Iandolo, Marco Savarese, Martin Klapez, Enrico Rossini, Marcello Pietri, Marco Picone, Marco Mamei, Maurizio Casoni, Carlo Augusto Grazia
First: 2026-06-11T12:47:32+00:00 · Latest: 2026-06-11T12:47:32+00:00
Comments: Accepted for publication at the IEEE 2026 Vehicular Technology Conference (VTC2026-Spring)
Abstract
Remote driving has gained increasing attention as a key enabler for connected and automated vehicles. Yet its practical deployment hinges on wireless networks' ability to guarantee low, predictable latency. In this paper, we present an extensive latency analysis of ITS-G5 and cellular (5G) technologies within the Modena Automotive Smart Area (MASA), a real-world, city-scale testbed equipped with a distributed intelligent transportation infrastructure. By conducting controlled experiments under varying network loads and traffic conditions, we measure network and end-to-end latency components relevant to remote driving, in which the uplink consists of a continuous video stream transmitted from the vehicle to the remote operator, and the downlink conveys control commands back to the car. Measurements conducted under diverse conditions reveal how latency and variability differ across the two technologies and how infrastructure coverage impacts video-stream transmission performance. Based on the observed latency distributions and reliability metrics, we assess the practical feasibility and safety margins of remote driving in mixed network environments. The results provide actionable insights for future teleoperation deployments and motivate hybrid communication strategies that combine the strengths of ITS-G5 and cellular networks.
Summary / 总结
Remote driving has gained increasing attention as a key enabler for connected and automated vehicles.
LNTest: A Testbed for Evaluating Bitcoin Lightning Network-Based Botnets
Authors: Thomas Bakaysa, Ahmet Kurt, Abdul-Salem Beibitkhan, Jesus Maria Romo Diaz de Leon, Tag Kalat, Joshua Kramer, Estela Rodriguez, Abraham Watkins, Abdullah Aydeger
First: 2026-06-11T04:29:49+00:00 · Latest: 2026-06-11T04:29:49+00:00
Comments: Accepted at the 21st International Conference on Availability, Reliability and Security (ARES 2026)
Abstract
Bitcoin's Lightning Network (LN) can be exploited as a covert, low-cost command-and-control (C&C) channel for botnets, as demonstrated by the LNBot and D-LNBot designs. However, both remain proof-of-concept prototypes evaluated only through simulation, leaving key questions about real-world topology formation, propagation complexity, and resilience to takedowns unanswered. We present LNTest, the first reusable testbed for LN-based botnets, built from Core Lightning nodes containerized with Docker over a shared Bitcoin Core regtest chain. LNTest supports three overlay topology modes (a deterministic chain, autonomous peer discovery, and user-supplied graphs), enabling controlled experiments across different botnet structures. Using LNTest, we report three main findings. First, D-LNBot's autonomous formation protocol does not produce the uniform chain from its design; instead, it creates a clustered chain in which cliques are linked by bridge nodes whose removal fragments the network. Second, command propagation scales linearly with botnet size ($Θ(n)$), not the $O(m \log n)$ previously claimed, and gains nothing from higher neighbor connectivity. Third, the overlay topology determines the effectiveness of takedown strategies: uniform-degree chains resist targeted removal but fragment under random failure, scale-free topologies show the opposite pattern, and the autonomous clustered chain is fragile under both, making it the most vulnerable of the three. LNTest is released as open source, with a script that reproduces all our experiments, to support reproducible research on LN-based botnet defenses.
Summary / 总结
Bitcoin's Lightning Network (LN) can be exploited as a covert, low-cost command-and-control (C&C) channel for botnets, as demonstrated by the LNBot and D-LNBot designs.
The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale
Authors: Quanyan Zhu
First: 2026-06-11T03:02:59+00:00 · Latest: 2026-06-11T03:02:59+00:00
Abstract
The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting framework highlights the central research challenges of controlled emergence, semantic interoperability, secure identity, incentive-compatible coordination, resource-aware orchestration, and governance for large-scale networks of autonomous agents.
Summary / 总结
The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action.
Compact LLM Deployment and World Model Assisted Offloading in Mobile Edge Computing
Authors: Ruichen Zhang, Xiaofeng Luo, Jiayi He, Jiawen Kang, Zehui Xiong, Shiwen Mao
First: 2026-02-14T06:37:29+00:00 · Latest: 2026-06-10T14:16:41+00:00
Comments: 16 pages, 10 figures
Abstract
This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks. We first propose an edge compact LLM deployment (ECLD) framework that jointly applies structured pruning, low-bit quantization, and knowledge distillation to construct edge-deployable LLM variants, and we evaluate these models using four complementary metrics: accessibility, energy consumption, hallucination rate, and generalization accuracy. Building on the resulting compact models, we formulate an MEC offloading optimization problem that minimizes the long-term average inference latency subject to per-device energy budgets and LLM-specific quality-of-service constraints on effective accuracy and hallucination. To solve this problem under unknown and time-varying network dynamics, we develop a world model-proximal policy optimization (PPO) algorithm, which augments an on-policy PPO algorithm with a learned recurrent world model that provides improved value targets and short imagination rollouts. Extensive experiments on Llama-3.1-8B, Qwen3-8B, and Mistral-12B show that ECLD compresses base models by about 70-80% in storage (i.e., from 15.3 GB to 3.3 GB for Llama-3.1-8B) and reduces per-query energy consumption by up to 50%, while largely preserving accuracy and often lowering hallucination compared with quantization-only or pruning-only baselines. Moreover, they also show that world model-PPO speeds up convergence by about 50%, improves the final reward by 15.8% over vanilla PPO, and reduces average inference latency by 12-30% across different user populations, while satisfying the accuracy and hallucination constraints and approaching the generation quality of always-offloading with much of the efficiency of local execution.
Summary / 总结
This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks.
LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence
Authors: Henok Daniel, Omar Alhussein, Cheng Li, Jie Liang, Ernesto Damiani
First: 2026-06-10T10:00:26+00:00 · Latest: 2026-06-10T10:00:26+00:00
Comments: 20 pages
Abstract
The Network Data Analytics Function (NWDAF) is central to enabling zero-touch network management in fifth-generation (5G) networks by supporting real-time analytics and closed-loop automation. Despite its critical role, open-source NWDAF implementations remain limited in scope and accessibility. In this paper, we develop an open-source NWDAF, compatible with the open-source core network Free5GC, that collects network data via subscriptions to Network Functions (NFs), and also includes an integrated Large Language Model (LLM) interface that enables natural language interaction with human operators. The interface processes user intents, encodes them using a semantic embedding model, and maps them to one of seven predefined intent categories to trigger analytics queries or event subscription commands. This architecture abstracts the complexity of traditional interfaces, allowing non-expert users to manage network analytics and subscriptions with ease. The system supports Access and Management Function (AMF) and Session Management Function (SMF) event subscriptions, real-time monitoring, and analytics retrieval via Prometheus, all accessible through a conversational interface. By bridging AI-driven intent recognition with standardized network analytics, our implementation enhances operator usability and provides a foundation towards AI-native 6G networks. The source code and datasets generated during the current study are available in the github repository, https://github.com/HenokDanielbfg/testbed.
Summary / 总结
The Network Data Analytics Function (NWDAF) is central to enabling zero-touch network management in fifth-generation (5G) networks by supporting real-time analytics and closed-loop automation.
Resource-Aware LLM Reasoning for Mobile Edge General Intelligence
Authors: Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Shiwen Mao
First: 2025-09-27T10:53:48+00:00 · Latest: 2026-06-10T07:43:17+00:00
Abstract
The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.
Summary / 总结
The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities.
Tiara: A Programmable Line-Rate ISA for Remote Memory Access
Authors: Bojie Li
First: 2026-06-10T05:01:17+00:00 · Latest: 2026-06-10T05:01:17+00:00
Abstract
RDMA one-sided verbs are the natural primitive for memory disaggregation, but they require the client to supply the exact remote address. The 1-RTT performance breaks down when the target address depends on data that must first be read from remote memory, a pattern we call the Indirection Wall. Indirection is pervasive: graph traversals follow pointers hop by hop, address translation walks multi-level page tables, distributed coordination requires conditional multi-host logic, and disaggregated LLM inference must resolve paged KV caches through block-table lookups. Each level of indirection costs one sequentially dependent network round-trip, yet offloading to existing RDMA NICs either consumes remote CPU cycles or has limited throughput. We present Tiara, a compact, statically verifiable instruction set that executes on the memory-side NIC. Tiara operators are pre-registered programs, analogous to eBPF programs in the kernel, that resolve indirection locally, collapsing multi-RTT dependent chains into a single round-trip. On an FPGA-based prototype, Tiara reduces 10-hop graph-traversal latency by 2.85x over one-sided RDMA while sustaining 3.4x higher throughput, cuts page-table walk latency by 62%, reduces uncontended distributed-lock latency by 2.9x, achieves 2.8x throughput for disaggregated PagedAttention at 8 KB blocks, and 1.88x MoE expert-gather latency at 32 experts.
Summary / 总结
RDMA one-sided verbs are the natural primitive for memory disaggregation, but they require the client to supply the exact remote address.
Generative Explainability for Next-Generation Networks: LLM-Augmented XAI with Mutual Feature Interactions
Authors: Kiarash Rezaei, Omran Ayoub, Sebastian Troia, Francesco Lelli, Paolo Monti, Carlos Natalino
Venue: Proc. WiMob, Marrakesh, Morocco, 2025
First: 2026-06-09T14:48:26+00:00 · Latest: 2026-06-09T14:48:26+00:00
Comments: 7 pages, with one page for appendix. Accepted for publication at the 2025 21th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
Abstract
As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust. Existing explainable artificial intelligence (XAI) techniques often fail to bridge this gap for non-specialists, producing technical outputs that are difficult to translate into actionable insights. This paper presents a framework specifically designed to address this shortcoming. It leverages a moderately sized large language model (LLM) and extends beyond the standard use of SHapley Additive exPlanations (SHAP) feature influence values. The framework employs a structured prompt enriched with mutual feature interaction data to generate human-understandable natural language explanations. To validate our framework, we performed an empirical evaluation on an optical quality of transmission (QoT) estimation use case with human evaluators. We collected independent performance evaluations from specialists, which showed a high inter-evaluator agreement. Compared to a state-of-the-art baseline that uses only SHAP feature influence values in a straightforward prompt, our approach improves the explanation usefulness and scope by 12.2% and 6.2%, while achieving 97.5% correctness.
Summary / 总结
As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust.
TraGe: A Generic Packet Representation for Traffic Classification Based on Header-Payload Differences
Authors: Chungang Lin, Yilong Jiang, Weiyao Zhang, Xuying Meng, Tianyu Zuo, Yujun Zhang
First: 2025-06-17T03:27:44+00:00 · Latest: 2026-06-09T12:46:23+00:00
Comments: This paper has been accepted by IWQoS 2025. The code is available at https://github.com/lincgcg/TraGe
Abstract
Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network. Since traditional methods heavily rely on feature extraction and large scale labeled data, some recent pre-trained models manage to reduce the dependency by utilizing different pre-training tasks to train generic representations for network packets. However, existing pre-trained models typically adopt pre-training tasks developed for image or text data, which are not tailored to traffic data. As a result, the obtained traffic representations fail to fully reflect the information contained in the traffic, and may even disrupt the protocol information. To address this, we propose TraGe, a novel generic packet representation model for traffic classification. Based on the differences between the header and payload-the two fundamental components of a network packet-we perform differentiated pre-training according to the byte sequence variations (continuous in the header vs. discontinuous in the payload). A dynamic masking strategy is further introduced to prevent overfitting to fixed byte positions. Once the generic packet representation is obtained, TraGe can be finetuned for diverse traffic classification tasks using limited labeled data. Experimental results demonstrate that TraGe significantly outperforms state-of-the-art methods on two traffic classification tasks, with up to a 6.97% performance improvement. Moreover, TraGe exhibits superior robustness under parameter fluctuations and variations in sampling configurations.
Summary / 总结
Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network.
High-Speed Generation of Periodic Traffic Patterns on P4TG for DDoS and Burst-Load Evaluation
Authors: Fabian Ihle, Etienne Zink, Michael Menth
First: 2026-06-09T11:11:59+00:00 · Latest: 2026-06-09T11:11:59+00:00
Comments: Accepted for publication at 12th IEEE International Conference on Network Softwarization (NetSoft 2026)
Abstract
Traffic generators are essential tools for evaluating the robustness and performance of networked systems. P4TG is an open-source, hardware-accelerated traffic generator implemented in P4 for the Intel Tofino ASIC. It has been adopted by researchers and industry due to its flexibility and multi-terabit generation capability, and its low cost compared to other traffic generators. However, like most existing generators, it primarily produces constant bit rate traffic, which does not reflect the highly time-varying behavior observed in real networks, such as flashcrowds and microbursts. Such patterns are difficult to emulate at scale with current tools. We present a data plane mechanism for P4TG that shapes periodic, time-varying traffic patterns, including patterns representative of DDoS attacks and burst-load scenarios. Pattern shaping in P4TG can be applied to its generated traffic at an aggregate throughput of up to 4 Tbit/s. We evaluate pattern accuracy and analyze scalability across different sampling resolutions and periods. Further, we demonstrate practical use cases, including zero-loss throughput determination and buffer capacity measurement. Finally, we present microburst-based attack scenarios that overload UDP receivers, switch buffers, and degrade TCP throughput on shared links while remaining undetectable to conventional rate monitoring.
Summary / 总结
Traffic generators are essential tools for evaluating the robustness and performance of networked systems.
LLM-Aided Joint Secrecy Precoding and Trajectory for RSMA-Based Heterogeneous UAV Networks
Authors: Lijie Zheng, Ji He, Shih Yu Chang, Yulong Shen
First: 2025-07-23T04:22:57+00:00 · Latest: 2026-06-09T09:46:35+00:00
Abstract
This paper investigates secure communications in rate-splitting multiple access (RSMA) enabled heterogeneous UAV networks, where multiple UAVs collaboratively serve ground terminals in the presence of eavesdroppers. By jointly considering secrecy rate maximization and propulsion energy consumption minimization, we formulate a multi-objective optimization problem involving UAV trajectory design, service association, power allocation, and secrecy precoding under mobility, collision-avoidance, service-capacity, and communication constraints. The formulated problem is highly non-convex due to the coupling among UAV trajectories, RSMA transmission variables, and secrecy constraints.To address the resulting non-convex and highly coupled optimization problem, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)-based S2DC algorithm combining penalty functions and difference-of-convex (D.C.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)-guided heuristic multi-agent reinforcement learning approach (LLM-HeMARL) for trajectory optimization. LLM-HeMARL efficiently incorporates LLM-generated expert heuristic policy, enabling UAVs to learn energy-aware, security-driven trajectories without the inference overhead of real-time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.
Summary / 总结
This paper investigates secure communications in rate-splitting multiple access (RSMA) enabled heterogeneous UAV networks, where multiple UAVs collaboratively serve ground terminals in the presence of eavesdroppers.
CAMASA: A CAM-based Dataset from the MASA Living Lab
Authors: Salvatore Iandolo, Marco Savarese, Gaetano Orazio Cauchi, Antonio Solida, Martin Klapez, Maurizio Casoni, Angelo Porrello, Carlo Augusto Grazia
First: 2026-06-09T09:45:51+00:00 · Latest: 2026-06-09T09:45:51+00:00
Comments: Accepted for publication at the IEEE 2026 Vehicular Technology Conference (VTC2026-Fall). Dataset will be available at netlab.unimore.it/MASA
Abstract
Trajectory prediction is a key enabler of autonomous and cooperative driving systems. However, most existing benchmarks are either sensor-centric, geographically constrained, or based on synthetic mobility traces that do not capture real-world V2X communication dynamics. This paper introduces CAMASA, a large-scale infrastructure-based dataset derived from Cooperative Awareness Messages (CAMs) and Decentralized Environmental Notification Messages (DENMs) collected within the Modena Automotive Smart Area (MASA). The dataset comprises more than 40 million CAMs and 2 million DENMs recorded under authentic urban traffic conditions over multiple months. We present a rigorous preprocessing pipeline that includes filtering, pseudonym reconciliation to account for ETSI privacy-driven stationID changes, and temporal normalization to 10 Hz trajectories, suitable for motion forecasting and time-series analysis. With over 14,000 km of reconstructed vehicle paths and tens of thousands of unique station IDs, CAMASA provides a statistically significant empirical foundation for research on Cooperative Intelligent Transportation Systems (C-ITS). Beyond trajectory prediction, the dataset enables calibration of microscopic urban traffic simulators (e.g., SUMO) and supports the development of realistic Intelligent Transportation Systems (ITS) Digital Twins by jointly modeling mobility patterns and V2X communication coverage in real deployments.
Summary / 总结
Trajectory prediction is a key enabler of autonomous and cooperative driving systems.
From MWM to iSLIP: A Linear-Algebraic Tutorial on Input-Queued Switch Scheduling
Authors: Xiaotong Yuan, An Guo
First: 2026-06-09T06:43:01+00:00 · Latest: 2026-06-09T06:43:01+00:00
Abstract
This paper uses three objects -- the queue matrix Q, the matching matrix P, and the Lyapunov energy function V = ||Q||^2 -- as a shared mathematical language to explain, within a single framework, the scheduling objective of maximum weight matching (MWM), queue stability under admissible traffic (per-port loads strictly below 1), and the mechanics of iSLIP's Grant-Accept row-column decoupling together with the long-run average service matrix P-bar. The setting throughout is an N-by-N SoC crossbar, where each clock cycle permits at most one cell transfer per input-output port pair. For the experimental comparison, we built a C++ discrete-event simulator and used exact MWM (solved by the Hungarian algorithm) as the performance reference. All three approximate algorithms are given a fixed iteration budget: r = 3 rounds per cycle for iSLIP and for spectral scheduling, and r_sink = 10 Sinkhorn normalization rounds for entropy-regularized optimal transport (OT). Throughput and average cell delay are measured across four traffic patterns. Spectral scheduling and entropy-regularized OT track MWM closely in both throughput and delay across most tested conditions. iSLIP, by contrast, hits a throughput ceiling of roughly 80% under non-uniform admissible traffic at high load (unbalanced pattern w = 0.5, rho_load >= 0.9), with bottleneck queues growing without bound and delays reaching two orders of magnitude above MWM. Under uniform traffic this breakdown does not occur: at rho_load = 0.99 iSLIP delay is about 3.7x that of MWM. The performance gains of spectral scheduling and OT come at an additional per-cycle compute cost on the order of O(r*N^2) multiply-accumulate or exponential operations; whether this overhead is feasible in real hardware -- in terms of die area, power, and timing closure -- remains to be evaluated.
Summary / 总结
This paper uses three objects -- the queue matrix Q, the matching matrix P, and the Lyapunov energy function V = ||Q||^2 -- as a shared mathematical language to explain, within a single framework, the scheduling objective of maximum weight matching (MWM), queue stability under admissible traffic (per-port loads strictly below 1), and the mechanics of iSLIP's Grant-Accept row-column decoupling together with the long-run average service matrix P-bar.
Secrets Best Not Shared: DNS Privacy Enhancements for the Constrained IoT
Authors: Martine S. Lenders, Thomas C. Schmidt, Matthias Wählisch
First: 2026-06-08T19:22:30+00:00 · Latest: 2026-06-08T19:22:30+00:00
Comments: 20 pages, 20 figures, 2 tables
Abstract
Attackers often identify DNS traffic to disrupt or compromise Internet services. While prior work has focused on encrypting queries using DNS over TLS, HTTPS, or QUIC to counter such attacks, we consider IETF protocols designed for resource-constrained IoT devices and empirically analyze the potential of obfuscating DNS traffic in addition to encryption. We create a dataset of machine-to-machine-compatible data objects along with the corresponding DNS resolution processes, evaluating 296 deployment scenarios of resolving host names, including DNS over the Constrained Application Layer Protocol (CoAP) and an onion routing flavor of CoAP under varying link-layer conditions. We compare them to DNS over HTTPS. Using Random Forest and a header field analysis, we identify fields that leak most information. Our findings show that DNS over CoAP with equalized packet lengths, block-wise transfer, and header compression reduces the accuracy of identifying DNS frames to 86% and further to 77% with payload compression. Our approach outperforms DNS over HTTPS, where classifiers always identify DNS frames based on IP addresses. The dataset is publicly available.
Summary / 总结
Attackers often identify DNS traffic to disrupt or compromise Internet services.
Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations
Authors: Arun Malik
First: 2026-06-08T07:15:53+00:00 · Latest: 2026-06-08T07:15:53+00:00
Comments: 7 pages, 6 figures
Abstract
Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the volume, velocity, and complexity of failures. This paper presents an agentic AI architecture for autonomous incident resolution in large-scale network operations. Our system employs a multi-agent orchestration framework where specialized AI agents collaborate to detect, diagnose, and remediate network incidents without human intervention. We describe the architectural principles, including hierarchical agent decomposition, skills-based tool invocation via standardized protocols, structured knowledge encoding from operational runbooks, progressive autonomy with safety boundaries, and closed-loop verification. The architecture has been deployed in production at a major cloud provider, demonstrating that agentic AI systems can achieve autonomous resolution rates exceeding 90% for common incident categories while maintaining safety guarantees through layered authorization and rollback mechanisms. We discuss design tradeoffs, failure modes, and lessons learned from operating autonomous AI agents at scale.
Summary / 总结
Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the volume, velocity, and complexity of failures.
Multi-SPIN: Multi-Access Speculative Inference for Cooperative Token Generation at the Edge
Authors: Haotian Zheng, Zhanwei Wang, Mingyao Cui, Chang Cai, Hongyang Du, Kaibin Huang
First: 2026-06-03T08:16:14+00:00 · Latest: 2026-06-07T06:19:01+00:00
Abstract
Speculative inference (SPIN) was originally developed as an efficient architecture to accelerate Large Language Models (LLMs). In this work, we propose its distributed deployment to enable cooperative token generation in a multiuser edge system; its advantage is to effectively balance computational loads between resource-constrained devices and servers. The resulting architecture, termed Multi-access SPIN (Multi-SPIN), utilizes on-device small language models to generate and upload candidate token drafts, while an edge server operates the LLM to verify them in parallel batches. Given the severe heterogeneity in users' computation and communication capabilities, the draft length emerges as a critical control variable that influences node-level computation loads and multi-access latency, thereby governing the sum token goodput. Consequently, considering frequency-division multiple access, we investigate the problem of multi-access draft control, a joint optimization of draft-length control and bandwidth allocation to maximize sum token goodput. We examine two cases: (1) homogeneous draft lengths across users to facilitate server-side batching, and (2) heterogeneous draft lengths to introduce a new dimension for goodput enhancement. By developing decomposition methods, we reduce these complex optimizations into tractable sub-problems, which allow efficient draft control algorithms to be derived in closed form. Our analysis shows that the optimal bandwidth allocation compensates users with weaker computation-and-communication capabilities in the homogeneous case due to the batching synchronization requirements, whereas its heterogeneous-case counterpart rewards users with higher acceptance rates by relaxing such requirements. Experiments using Llama-2 and Qwen3.5 model pairs across diverse tasks demonstrate that Multi-SPIN improves goodput by up to 88% over heterogeneity-agnostic baselines.
Summary / 总结
Speculative inference (SPIN) was originally developed as an efficient architecture to accelerate Large Language Models (LLMs).
AI-Native Closed-Loop Security for 6G-Enabled Cyber-Physical Systems: From Edge Detection to Network-Wide Mitigation
Authors: Bilal Hussain, Muhammad Bilal, Tan Li, Haris Pervaiz, Xiao Tang, Qinghe Du, Fawad Ahmad, Muhammad Azhar, Jun Zhang
First: 2026-06-06T13:36:59+00:00 · Latest: 2026-06-06T13:36:59+00:00
Comments: 30 pages, 12 figures, survey paper, submitted to IEEE Communications Surveys & Tutorials (IEEE COMST)
Abstract
In sixth-generation (6G) networks, billions of cyber-physical systems (CPSs) - autonomous vehicles, smart grids, industrial robots, and remote-surgical equipment - will run over ultra-reliable low-latency slices, collapsing the gap between a remote breach and physical harm to milliseconds, a budget perimeter firewalls and centralised security operations centres cannot meet. This survey reframes 6G CPS security as a closed-loop, AI-native pipeline that senses at the multi-access edge computing (MEC) tier, using minute-scale call-detail records (CDRs) for baseline learning and sub-millisecond RAN/Open-RAN (O-RAN) telemetry for the latency-critical path. It decides locally with compressed deep models, mitigates network-wide via SDN, NFV, and O-RAN controllers, and retrains through federated learning (FL) and digital-twin (DT) replay. We formalise a per-slice, tail-bounded latency contract on the sense, detect, and mitigate stages, enforced at a slice-dependent tail percentile (p99 for safety-critical URLLC slices). Organising 128 peer-reviewed studies (2017-2026) under a PRISMA 2020 protocol, we (i) map the 6G/CPS threat surface to MITRE ATT&CK and a CDR-observable feature space; (ii) unify edge anomaly detection and DDoS classification across twelve datasets and statistical, graph, and transformer models; (iii) synthesise SDN/NFV/O-RAN primitives into one closed-loop reference architecture; (iv) treat FL, large language models (LLMs), DT, post-quantum cryptography (PQC), zero-trust architecture (ZTA), and explainable AI as cross-cutting enablers, not parallel pillars; and (v) consolidate open problems into five directions spanning data, latency, trust, standardisation, and evaluation.
Summary / 总结
In sixth-generation (6G) networks, billions of cyber-physical systems (CPSs) - autonomous vehicles, smart grids, industrial robots, and remote-surgical equipment - will run over ultra-reliable low-latency slices, collapsing the gap between a remote breach and physical harm to milliseconds, a budget perimeter firewalls and centralised security operations centres cannot meet.
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning
Authors: Farhad Rezazadeh, Amir Ashtari Gargari, Hatim Chergui, Sandra Lagen, Merouane Debbah, Houbing Song, Lingjia Liu
First: 2025-11-04T17:22:22+00:00 · Latest: 2026-06-05T08:12:49+00:00
Comments: 13 Pages, 3 Figures, 4 Tables
Abstract
We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative "what-if" forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.
Summary / 总结
We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty.
Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds
Authors: Jiaming Cheng, Duong Tung Nguyen
First: 2026-04-08T18:11:09+00:00 · Latest: 2026-06-05T07:23:00+00:00
Abstract
Serving large language model (LLM) inference in cloud environments requires jointly optimizing model selection, GPU provisioning, parallelism configuration, and workload routing under latency, accuracy, memory, and budget constraints. While mixed-integer linear programming (MILP) can model this problem, its computational cost limits frequent re-optimization under demand variability. Existing heuristics often optimize individual components separately and may become infeasible when system-wide constraints are enforced.
This paper presents a scalable framework for SLO-constrained LLM inference. We formulate the problem as an MILP with a two-phase delay model capturing both prefill and autoregressive decoding under tensor and pipeline parallelism. To solve it efficiently, we develop two constraint-aware heuristics: a Greedy Heuristic (GH) and an Adaptive Greedy Heuristic (AGH). AGH extends GH through multi-start construction, local search, and GPU consolidation. Both methods maintain feasibility through parallelism-aware filtering, cost-based ranking, and adaptive parallelism scaling.
Experiments based on the Azure LLM Inference Trace show that GH generates feasible solutions within one second, while AGH achieves near-optimal performance within three seconds and scales to large instances where exact solvers fail to converge. Under out-of-sample stress with up to 1.5x delay and accuracy inflation, AGH degrades gracefully through provisioned headroom, yielding substantially lower cost and SLO violations than cost-minimal MILP solutions. Across synthetic and real Azure workloads, AGH maintains SLO compliance at significantly lower cost than exact MILP solutions. These results demonstrate that high-quality allocations provide substantial robustness to demand variability while enabling rapid adaptation to workload changes.
Summary / 总结
Serving large language model (LLM) inference in cloud environments requires jointly optimizing model selection, GPU provisioning, parallelism configuration, and workload routing under latency, accuracy, memory, and budget constraints.
Natural Language Access Control (NLAC): From Help Desk Requests to Structured Policies
Authors: Jonas Wessner, Tobias Meuser, Janek Schoffit, Dennis Eisermann, Johannes Deger, Björn Scheuermann, Frank Kargl
First: 2026-06-04T21:22:08+00:00 · Latest: 2026-06-04T21:22:08+00:00
Abstract
Configuring network access control policies in large, complex networks is error-prone and requires significant expert effort. LLMs offer a promising interface for expressing such policies in natural language, but their capability for translating user requests into access policies, and the system architectures best suited to leverage LLMs, remain underexplored. We present an architecture for natural-language access control (NLAC) that uses LLMs to translate user requests into access policies, and introduce NLACBench, a benchmark for evaluating LLM-based intent translation systems in large-scale networks. Our evaluation across multiple state-of-the-art models shows that top-performing LLMs achieve up to 96.9% accuracy in small-network settings, but performance degrades substantially (below 20% for some models) as network size increases. To address this limitation, we identify relevant network components via embedding similarity and construct compact subgraphs that are passed to the LLM. This approach enables scaling to larger networks with up to 98.7% accuracy, while simultaneously reducing inference time, hardware requirements, and operating costs to a constant resource budget. Finally, a case study indicates that top-performing models exhibit largely complementary error patterns, suggesting that intent translation accuracy may be further improved through multi-LLM architectures.
Summary / 总结
Configuring network access control policies in large, complex networks is error-prone and requires significant expert effort.
DAST: A VLM-LLM Framework for Cross-Interface Anomaly Detection in O-RAN
Authors: Francesco Spinelli, Esteban Municio, Pau Baguer, Gines Garcia-Aviles, Xavier Costa-Perez
First: 2026-06-04T15:05:04+00:00 · Latest: 2026-06-04T15:05:04+00:00
Comments: 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication
Abstract
O-RAN enables a disaggregated baseband stack with programmable functions that communicate over standardized open interfaces. The same openness that enables multi-vendor composition also expands the attack surface across logically decoupled tiers that make up the compute continuum. Among these threats, Denial-of-Service and performance-degradation attacks, which account for the majority of catalogued O-RAN threats, are particularly difficult to detect. Traditional Time-Series Anomaly Detection (TSAD) methods fail in this new regime where labelled baselines are scarce, threats evolve faster than detectors can be retrained, and the high-dimensional multivariate telemetry overwhelms monolithic inference models. To address these challenges, we present DAST, a zero-shot multi-agent framework for cross-interface anomaly detection in O-RAN that chains a three-stage VLM $\rightarrow$ LLM $\rightarrow$ VLM pipeline. DAST converts multivariate KPI streams into visual representations, scores textual per-interface descriptions against O-RAN domain knowledge, and verifies suspects on high-resolution heatmaps to output the problematic interfaces, the anomalous time intervals, an indicative O-RAN WG11-aligned operational impact rating and the decision rationale. We evaluate DAST on real network traces collected from an O-RAN testbed under representative performance degradation scenarios, achieving 0.910 F1-Score and 0.843 Accuracy, outperforming state-of-the-art TSAD baselines.
Summary / 总结
O-RAN enables a disaggregated baseband stack with programmable functions that communicate over standardized open interfaces.
Efficient Asynchronous Federated Evaluation with Strategy Similarity Awareness for Intent-Based Networking in Industrial Internet of Things
Authors: Shaowen Qin, Jianfeng Zeng, Haodong Guo, Xiaohuan Li, Jiawen Kang, Qian Chen
First: 2025-11-28T09:03:26+00:00 · Latest: 2026-06-04T13:26:29+00:00
Comments: 12 pages with 7 figures and 4 tables
Abstract
Intent-Based Networking (IBN) offers a promising paradigm for intelligent and automated network control in Industrial Internet of Things (IIoT) environments by translating high-level user intents into executable network strategies. However, frequent strategy deployment and rollback are impractical due to tightly coupled workflows and high downtime costs, while node heterogeneity and privacy constraints further complicate centralized strategy evaluation. To address these challenges, we propose a Federated Evaluation Enhanced Intent-Based Networking framework (FEIBN), which leverages large language models (LLMs) to translate user intents into structured strategy tuples and employs federated learning to support distributed strategy evaluation. To improve training efficiency and reduce communication overhead, we design a Strategy Similarity Aware Federated Learning mechanism (SSAFL), which selects nodes relevant to the task based on strategy similarity and resource status, and triggers asynchronous model uploads only when local updates are significant. Experiments demonstrate that the proposed method improves model accuracy, accelerates convergence, and reduces communication cost compared with the baselines.
Summary / 总结
Intent-Based Networking (IBN) offers a promising paradigm for intelligent and automated network control in Industrial Internet of Things (IIoT) environments by translating high-level user intents into executable network strategies.
Dual-Mode Wireless Devices for Adaptive Pull and Push-Based Communication
Authors: Sara Cavallero, Fabio Saggese, Junya Shiraishi, Israel Leyva-Mayorga, Shashi Raj Pandey, Chiara Buratti, Petar Popovski
First: 2025-07-31T10:52:35+00:00 · Latest: 2026-06-03T12:31:01+00:00
Comments: Submitted to IEEE Transactions on Communications, Copyright might be transferred without notice
Abstract
This paper introduces a dual-mode communication framework for wireless devices that integrates query-driven (pull) and event-driven (push) transmissions within a unified time-frame structure. Devices typically respond to information requests in pull mode, but if an anomaly is detected, they preempt the regular response to report the critical condition. Additionally, push-based communication is used to proactively send critical data without waiting for a request. This adaptive approach ensures timely, context-aware, and efficient data delivery across different network conditions. To achieve high energy efficiency, we incorporate a wake-up radio mechanism and we design a tailored medium access control (MAC) protocol that supports data traffic belonging to the different communication classes. A comprehensive system-level analysis is conducted, accounting for the wake-up control operation and evaluating three key performance metrics: the success probability of anomaly reports (push traffic), the success probability of query responses (pull traffic) and the total energy consumption. Numerical results characterize the system's behavior and highlight the inherent trade-off between push and pull success probabilities as a function of allocated communication resources. Our analysis demonstrates that the proposed approach achieves up to a 42% reduction in energy consumption per served packet compared to traditional approaches, while maintaining reliable support for both communication paradigms.
Summary / 总结
This paper introduces a dual-mode communication framework for wireless devices that integrates query-driven (pull) and event-driven (push) transmissions within a unified time-frame structure.
Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis
Authors: Yuantu Luo, Jun Tao, Linxiao Yu, Guang Cheng
First: 2026-06-03T06:52:29+00:00 · Latest: 2026-06-03T06:52:29+00:00
Comments: This work has been submitted to the IEEE for possible publication
Abstract
Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities. However, while complex preprocessing pipelines and sophisticated model structures often achieve strong performance, they may obscure inherent protocol semantics during representation learning. Moreover, the hierarchical structure of protocol layers and their corresponding fields, defined by protocol specifications and routinely utilized in manual traffic analysis, remains underexplored in existing learning frameworks. In this paper, we propose Protocol Tree Graph Attention with Mixture of Experts (PTGAMoE), a semantic-preserving hierarchical graph-based expert framework for encrypted traffic analysis. The field-based graph construction and expert committee design enable PTGAMoE to quantify the model's preferences for specific fields and protocols. Extensive experimental results on representative benchmark datasets under strict no-data-leakage settings demonstrate that PTGAMoE significantly outperforms state-of-the-art (SOTA) models. Furthermore, the semantic-preserving design provides interpretable insights into protocol-level feature importance and expert-level contributions, reflecting the model's decision-making logic in encrypted traffic classification tasks.
Summary / 总结
Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities.
vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models
Authors: Xunzhuo Liu, Huamin Chen, Samzong Lu, Yossi Ovadia, Guohong Wen, Hao Wu, Zhengda Tan, Jintao Zhang, Senan Zedan, Yehudit Kerido, Liav Weiss, Haichen Zhang, Bishen Yu, Asaad Balum, Noa Limoy, Abdallah Samara, Baofa Fan, Brent Salisbury, Ryan Cook, Zhijie Wang, Qiping Pan, Rehan Khan, Avishek Goswami, Houston H. Zhang, Shuyi Wang, Ziang Tang, Fang Han, Zohaib Hassan, Jianqiao Zheng, Avinash Changrani, Xue, Liu, Bowei He
First: 2026-02-23T15:00:01+00:00 · Latest: 2026-06-03T06:35:07+00:00
Comments: Technical Report
Abstract
As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The architecture follows two complementary Shannon-inspired views. In the information-theoretic regime, signal extraction reduces the entropy of "which model?" by distilling routing-relevant information from raw queries. In the Boolean-algebraic regime, the decision engine composes functionally complete routing policies from signal conditions. The central innovation is composable signal orchestration: thirteen heterogeneous signal types, spanning sub-millisecond heuristics and neural classifiers for semantics, safety, and modality, are composed through configurable Boolean decision rules into deployment-specific routing policies, so that fundamentally different scenarios (multi-cloud enterprise, privacy-regulated, cost-optimized) are expressed as different configurations over the same architecture. Matched decisions drive semantic model routing via thirteen selection algorithms, while per-decision plugin chains enforce safety constraints including a three-stage HaluGate hallucination detection pipeline and a lightweight episodic memory system with ReflectionGate for personalized multi-turn context. A typed neural-symbolic DSL specifies these routing policies and compiles them to multiple deployment targets, enabling configuration-first adaptation without code changes. Together, these components show that composable signal orchestration enables a single framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.
Summary / 总结
As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge.
Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management
Authors: Hojjat Navidan, Mohammad Cheraghinia, Jaron Fontaine, Mohamed Seif, Eli De Poorter, H. Vincent Poor, Ingrid Moerman, Adnan Shahid
First: 2026-02-15T12:34:01+00:00 · Latest: 2026-06-02T22:00:53+00:00
Comments: Submitted to the IEEE Networks Journal
Abstract
Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist across the service management layer and RAN Intelligent Controller (RIC), while independently developed control applications can interact in unintended ways. In parallel, recent advances in generative Artificial Intelligence (AI) are enabling a shift from isolated AI models toward agentic AI systems that can interpret goals, coordinate multiple models and control functions, and adapt their behavior over time. This article proposes a multi-scale agentic AI framework for O-RAN that organizes RAN intelligence as a coordinated hierarchy across the Non-Real-Time (Non-RT), Near-Real-Time (Near-RT), and Real-Time (RT) control loops: (i) A Large Language Model (LLM) agent in the Non-RT RIC translates operator intent into policies and governs model lifecycles. (ii) Small Language Model (SLM) agents in the Near-RT RIC execute low-latency optimization and can activate, tune, or disable existing control applications; and (iii) Wireless Physical-layer Foundation Model (WPFM) agents near the distributed unit provide fast inference close to the air interface. We describe how these agents cooperate through standardized O-RAN interfaces and telemetry. Using a proof-of-concept implementation built on open-source models, software, and datasets, we demonstrate the proposed agentic approach in two representative scenarios: robust operation under non-stationary conditions and intent-driven slice resource control.
Summary / 总结
Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity.
Inductive Latent Context Persistence: Closing the Post-Handover Cold Start in 6G Radio Access Networks
Authors: Anubhab Banerjee, Daniyal Amir Awan
Venue: ICML 2026
First: 2026-05-01T12:00:06+00:00 · Latest: 2026-06-02T18:13:32+00:00
Abstract
In modern radio access networks (RANs), rule-based handover (HO) decisions (e.g., A3/A5) depend on user equipment (UE) measurements only, so UEs at the same location can receive inconsistent HO outcomes. GNN-based methods improve HO KPIs using richer context than measurements alone. However, recurrent or graph models discard the per-UE recurrent state at HO and reinitialize at the target next-generation Node B (gNB), losing mobility history and forcing the target model to rebuild from post-HO measurements only. We address this post-HO cold start with Inductive Latent Context Persistence (ILCP), compressing the source recurrent state, transporting it on the 3GPP Xn as a 128-byte payload, and adapting it at the target gNB. We model the RAN as a dynamic heterogeneous graph over UE nodes, gNB nodes, measurement edges, and Xn edges. On a Vienna 4G/5G drive-test, ILCP achieves 0.0% ping-pong HOs versus 6.5% for an identical no-transfer baseline and 22.6% for a Transformer baseline; post-HO accuracy improves by +5.1 pp on average (peak +13.3 pp) in the 50-250 ms window. On one NVIDIA GTX 1080 (8 GB), ILCP runs end-to-end at 7.7 ms p99 per handover decision. Under perturbations (shadow fading, NLOS blockage, SSB-burst sparsity), robustly trained ILCP keeps handover failure (HOF) in the 10-13% range. Under the same fixed-reference-label setting, A3/A5 rises from 1.1% to 57-65% HOF when measurements are perturbed, exposing limits of measurement-only rules.
Summary / 总结
In modern radio access networks (RANs), rule-based handover (HO) decisions (e.g., A3/A5) depend on user equipment (UE) measurements only, so UEs at the same location can receive inconsistent HO outcomes.
NetKV: Network-Aware Decode Instance Selection for Disaggregated LLM Inference
Authors: Mubarak Adetunji Ojewale
First: 2026-06-02T17:06:57+00:00 · Latest: 2026-06-02T17:06:57+00:00
Abstract
Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and prefix-cache locality alone, ignoring the topological distance and dynamic congestion between prefill and decode instances. We close this gap with a thin operator-to-scheduler interface, the network cost oracle, and we prove that ignoring the network term renders cache-aware-only scheduling arbitrarily suboptimal as context length grows. NetKV, the O(|D|) per-request greedy that consumes this oracle, has tier rankings that are provably robust to stale telemetry. On a 64-GPU four-tier fat-tree simulator driven by Mooncake traces, NetKV reduces mean TTFT by up to 21.2% over round-robin and 17.6% over a tuned cache+load-aware scheduler, lifts SLO attainment by up to 20.1 percentage points, and keeps the Time Between Tokens overhead below 0.5 ms in every condition tested, with no changes to the transport, inference engine, or hardware.
Summary / 总结
Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget.
When BBR Meets Live Streaming
Authors: Xu Yan, Tong Li, Bo Wu, Cheng Luo, Jiuxiang Zhu, Laizhong Cui
First: 2026-06-02T10:46:17+00:00 · Latest: 2026-06-02T10:46:17+00:00
Abstract
Recently, industrial pioneers like Amazon, Tencent, ByteDance, and Huawei have been adopting BBR as their congestion control algorithm for live-streaming applications, including TikTok Live. However, BBR, originally crafted for bulk data transmission, faces multiple challenges in live-streaming scenarios. In this paper, we first explore two key issues associated with BBR due to inaccurate bandwidth estimation in live-streaming scenarios: (i) BBR cannot easily exit its startup phase, resulting in a fierce self-inflicted loss. (ii) BBR sends data at a lower rate than the available bandwidth during its stable phase. We then propose BBR-Copilot, an auxiliary congestion control component that cooperates with BBR, making BBR better adapt to live-streaming scenarios. BBR-Copilot allows for proactively generating accurate bandwidth measurement samples by smartly creating and sending extra data. We implement the BBR-Copilot prototype upon QUIC and evaluate it via testbed. Experimental evaluation results show that BBR-Copilot effectively enhances BBR's performance in live-streaming scenarios.
Summary / 总结
Recently, industrial pioneers like Amazon, Tencent, ByteDance, and Huawei have been adopting BBR as their congestion control algorithm for live-streaming applications, including TikTok Live.
BigDipper: Sharded Censorship Resistant Data Availability for Leader-Based BFT
Authors: Bowen Xue, Samuel Laferriere, Soubhik Deb, Sreeram Kannan
First: 2023-07-03T22:41:27+00:00 · Latest: 2026-06-02T02:19:35+00:00
Abstract
Leader-based Byzantine-fault-tolerant (BFT) protocols provide low latency and simple communication structure, but they give the leader short-term control over transaction inclusion. A malicious leader can keep the protocol live while delaying or excluding time-sensitive transactions such as auction bids, oracle updates, liquidations, and bridge messages. Existing responses often build a fixed censorship-resistance, hiding, or ordering mechanism into the protocol path, forcing all transactions to pay for the same protection level. name follows the end-to-end principle: the consensus layer exposes inclusion primitives rather than hardcoding stronger policies. Higher-layer protocols can then choose their own submission strategies and resources, whether through replication, erasure coding, or other mechanisms, to obtain the censorship-resistance, hiding, ordering, or execution guarantees they need. At the core of BigDipper is censorship-resistant data availability, or DA-CR, which certifies available replica-contributed mini-blocks for use by leader-based consensus. A central design goal is that data remains sharded on the consensus critical path: validators do not reconstruct or execute the full payload before voting, but instead check commitments, availability evidence, and the DA-CR inclusion rule. We define DA-CR guarantees for data-tampering resistance, honest mini-block inclusion, and residual leader influence. We then give concrete constructions based on erasure coding and linear commitments, analyze client-tunable transaction submission, and instantiate BigDipper inside HotStuff-2.
Summary / 总结
Leader-based Byzantine-fault-tolerant (BFT) protocols provide low latency and simple communication structure, but they give the leader short-term control over transaction inclusion.
RadioMaster: Multi-Agent System for Autonomous Radio Signal Generation
Authors: Jiazhen Lei, Tianze Cao, Yuxin Sha, Sihan Wang, Bingbing Wang, Fengyuan Zhu, Zeming Yang, Xiaohua Tian
First: 2026-06-01T08:13:07+00:00 · Latest: 2026-06-01T08:13:07+00:00
Abstract
Translating user intents into physical radio signals represents the critical yet notoriously tedious final step in wireless prototyping, as it requires intricate knowledge of physical layer details and presents immense implementation challenges. Large Language Models (LLMs) and multi-agent systems have revolutionized conventional software engineering, raising the compelling question of whether they can resolve these formidable difficulties. However, our investigations reveal that current models experience significant limitations and fail to accomplish this task when applied to radio signal generation. This performance degradation primarily stems from severe domain ignorance and a fundamental insensitivity to physical hardware constraints. To bridge this gap, we introduce RadioMaster, a fully autonomous multi-agent framework designed to seamlessly translate user input into real-world wireless emissions. RadioMaster operates on three synergistic pillars: RadioWiki for domain-specific knowledge retrieval, RadioAgent for collaborative I/Q sample generation alongside hardware configuration, and RadioEmulator for closed-loop physical layer verification. Furthermore, we construct RadioBench, the first comprehensive benchmark tailored specifically for the radio signal generation domain. Extensive real-world evaluations demonstrate that RadioMaster significantly outperforms state-of-the-art (SOTA) baselines regarding configuration viability and signal fidelity.
Summary / 总结
Translating user intents into physical radio signals represents the critical yet notoriously tedious final step in wireless prototyping, as it requires intricate knowledge of physical layer details and presents immense implementation challenges.
Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics
Authors: Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein
First: 2026-05-31T23:53:24+00:00 · Latest: 2026-05-31T23:53:24+00:00
Abstract
Frontier LLMs increasingly decide what a query attends to with a sparse-attention indexer that picks a few KV-cache blocks per query: attention's unit is now a small, reusable chunk. Agentic workloads hammer it: many sub-agents query one large codebase, reusing the same blocks. When that corpus outgrows one GPU it is partitioned across instances, so a query and the blocks it selects often sit on different GPUs: answering it means attention across instances. The reflex of prior cross-instance KV systems is to move the cache: pull the selected blocks to the requester. Multi-head Latent Attention inverts the arithmetic, compressing each token's key and value into one narrow vector, so a routed query row is only ~1 KB, smaller than the chunk it attends; routing the query is then often cheaper than moving the cache. Which primitive wins, over which fabric and request shape, is uncharted, least of all on device-initiated RDMA that makes per-request cross-node transfers cheap. We characterize cross-instance MLA attention on a real multi-node H100 cluster, distilling two reusable artifacts: a topology-aware cost model (probe / transfer / compute / return / merge) and a closed-form route/fetch/local predicate, whose constants we measure on real IBGDA, where the model tracks batched round-trips to within ~7%. At decode it routes the query, trading the cost of moving the cache (a ~3 ms re-adaptation splice for a contiguous chunk, or a scattered gather under selection) for a tens-of-microsecond round trip, and picks the fabric by probe latency, not peak bandwidth. We instantiate the cost model and predicate for MLA, but neither is MLA-specific: they apply wherever compression or sparse selection shrinks attention to small chunks (DeepSeek-V3.2, V4, and GLM-5.1 today). Extending them to a new architecture requires measuring just two coefficients: the routed payload and fetch's move-the-cache cost.
Summary / 总结
Frontier LLMs increasingly decide what a query attends to with a sparse-attention indexer that picks a few KV-cache blocks per query: attention's unit is now a small, reusable chunk.
A Reproducible UAV-Assisted VANET Dataset Generator for Fragmentation Risk Analysis in Intelligent Transportation Systems
Authors: Bappa Muktar, Justin Moskolaï Ngossaha, Adama Nouboukpo
First: 2026-05-31T23:04:41+00:00 · Latest: 2026-05-31T23:04:41+00:00
Abstract
Vehicular Ad Hoc Networks (VANETs) are a key component of Intelligent Transportation Systems, enabling cooperative communication among vehicles and between vehicles and roadside infrastructure. However, their highly dynamic topology makes them vulnerable to network fragmentation, particularly in highway scenarios, low-density traffic conditions, localized accident zones, and communication-stressed environments. Although Unmanned Aerial Vehicles (UAVs) have been increasingly investigated as temporary aerial relays for improving VANET connectivity, reusable, future-labeled, and reproducible datasets designed to support short-term fragmentation risk analysis remain limited. This paper proposes a reproducible UAV-assisted VANET dataset generator for short-term fragmentation risk prediction. The proposed framework simulates a two-lane highway scenario in which vehicles move in opposite directions while UAVs operate as aerial support nodes. It incorporates multiple data collection profiles, including free-flow traffic, localized accidents, sparse extended topologies, dense bursty traffic, and mixed stress conditions. During each simulation episode, the generator periodically extracts mobility, topology, UAV coverage, and communication-window features, then assigns each sample a future fragmentation label based on the network state observed after a configurable prediction horizon. An illustrative generated dataset is descriptively characterized in terms of scenario balance, UAV policy balance, future-label distribution, scenario-specific label behavior, and representative feature ranges. By providing a modular, extensible, and reproducible ns-3-based data-generation framework, this work offers a practical basis for future supervised learning studies and connectivity management strategies in UAV-assisted VANETs.
Summary / 总结
Vehicular Ad Hoc Networks (VANETs) are a key component of Intelligent Transportation Systems, enabling cooperative communication among vehicles and between vehicles and roadside infrastructure.
FLUID: Slack-based Low-latency Delivery
Authors: Michael Luby
First: 2026-05-05T16:40:28+00:00 · Latest: 2026-05-31T16:43:18+00:00
Comments: 22 pages, 3 figures, 3 tables, 18 references
Abstract
We introduce FLUID (Fountain LiqUId Delivery), a protocol that uses fountain coding and receiver feedback for low-latency delivery of data blocks over lossy networks. Idealized Automatic Repeat reQuest (ARQ) protocols are bandwidth-optimal, but must deliver every packet in a block and therefore can require additional rounds under packet loss. FLUID uses a controlled amount of slack to relax this all-packets requirement, allowing delivery to finish once enough encoded packets have been received. This yields substantially tighter delivery latency while remaining deterministically close to the ARQ bandwidth optimum.
FLUID is controlled by a slack parameter $ε$. Under the Loss-Product Rule, delivery finishes once the product of packet loss fractions across transmission rounds falls below $ε$. Thus, FLUID can finish delivery in a small number of rounds even when every round experiences packet loss, while $ε$ controls the gap between FLUID and bandwidth-optimal ARQ.
Summary / 总结
We introduce FLUID (Fountain LiqUId Delivery), a protocol that uses fountain coding and receiver feedback for low-latency delivery of data blocks over lossy networks.
A Communication-Centric 6G-LLM Architecture for Scalable Tactical Autonomous Defense Vehicle Networks
Authors: Kiran Khurshid, Shumaila Javaid, Nasir Saeed
Venue: K. Khurshid, S. Javaid and N. Saeed, "A Communication-Centric 6G-LLM Architecture for Scalable Tactical Autonomous Defense Vehicle Networks," in IEEE Network, Early access, 2026
First: 2026-05-31T16:00:14+00:00 · Latest: 2026-05-31T16:00:14+00:00
Comments: 10 pages, accepted in IEEE Network Magazine
Abstract
The integration of Artificial Intelligence (AI) and emerging 6G networks introduces new opportunities for scalable coordination in tactical autonomous vehicle systems. This paper proposes a communication-centric hierarchical architecture for Tactical Autonomous Defense Vehicle Networks (TADVNs) that models the integration of edge-assisted Large Language Model (LLM) reasoning with 6G-enabled connectivity and semantic communication. The framework is designed to improve coordination efficiency, reduce communication overhead, and enhance latency resilience under increasing fleet-scale operation. Unlike conventional task-specific AI pipelines that rely on structured feature processing and rule-based coordination, the proposed approach incorporates semantic abstraction and context-aware decision support within a layered edge-cloud communication architecture. We evaluate communication and coordination performance via Monte Carlo simulations across fleet sizes of 5-30 vehicles under contested network conditions. Results indicate that at a 30-vehicle scale, the 6G-LLM configuration achieves 75.2% latency reduction (29.1 ms vs. 117.5 ms), a 68.7 percentage point increase in mission success rate (82.9% vs. 14.2%), and an 88.6% reduction in communication overhead compared to a 5G-based conventional AI baseline. These findings demonstrate measurable benefits in coordination and communication when semantic reasoning is combined with low-latency 6G connectivity.
Summary / 总结
The integration of Artificial Intelligence (AI) and emerging 6G networks introduces new opportunities for scalable coordination in tactical autonomous vehicle systems.
AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics
Authors: Ranulfo Bezerra, Satoshi Tadokoro, Kazunori Ohno
Venue: IEEE Internet of Things Journal, vol. 13, no. 10, pp. 20398-20412, 15 May15, 2026
First: 2026-05-31T05:10:34+00:00 · Latest: 2026-05-31T05:10:34+00:00
Comments: 15 pages, 3 figures, 3 tables. Published in IEEE Internet of Things Journal
Abstract
The convergence of Artificial Intelligence, the Internet of Things, and Robotics is no longer a futuristic vision; it is rapidly becoming the foundation of real-time, intelligent, and context-aware systems. AI enables perception and reasoning, IoT provides scalable sensing and communication, and robotics delivers embodied actuation. Despite significant progress in pairwise combinations such as AIoT and the Internet of Robotic Things (IoRT), there remains a lack of unified design frameworks that fully integrate all three. This survey synthesizes the state-of-the-art across these domains, emphasizing the emerging role of Small Language Models (SLMs) at the edge and Large Language Models (LLMs) in the cloud for distributed cognition and autonomous decision-making. We propose a modular system architecture that aligns with these trends, analyze persistent gaps in interoperability and feedback control, and classify existing work by integration depth. Our review highlights how hybrid SLM-LLM systems, when coupled with IoT infrastructure and robotic agents, can address challenges in real-time adaptation, scalability, and reliability. This work offers a conceptual and technical roadmap for designing next-generation AI-IoT-Robotic ecosystems that are modular, interpretable, and capable of learning within dynamic environments, paving the way for the emerging paradigm of Connected Robotics and Physical AI.
Summary / 总结
The convergence of Artificial Intelligence, the Internet of Things, and Robotics is no longer a futuristic vision; it is rapidly becoming the foundation of real-time, intelligent, and context-aware systems.
Make a Video Call with LLM: A Measurement Campaign over Six Mainstream Apps
Authors: Jiayang Xu, Xiangjie Huang, Zijie Li, Antariksh Verma, Zili Meng
First: 2025-10-01T04:03:51+00:00 · Latest: 2026-05-30T15:21:07+00:00
Abstract
In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate six mainstream AI video chatbots with this benchmark. We also build an online platform for user study. The measurement leads to interesting findings that could be beneficial to the future optimizations. For example, the network latency of AI video chat matters not as much as human video chat. The capabilities of AI agents matters most in the user experience. Our benchmarking results also open up several research questions for future optimizations of AI video chatbots. Availability: https://callarena.net/ for the online evaluation platform and our open-sourced dataset and testbed.
Summary / 总结
In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people.
AgentxGCore: Agentic AI for Next-Generation Mobile Core Network
Authors: Maria Katarine Santana Barbosa, Kelvin L. Dias
First: 2026-05-29T23:13:46+00:00 · Latest: 2026-05-29T23:13:46+00:00
Comments: This paper has been accepted for publication in IEEE Network
Abstract
To meet the stringent requirements of emerging applications and the increasingly complex network management and operation, the Next Generation Mobile Networks (NextG), or 6G, will adopt an AI-native architecture on the Core Network (CN). In this movement, the Third Generation Partnership Project (3GPP) has extended the cellular CN with new function as a first step toward integrating analytics, Artificial Intelligence (AI), and machine learning. However, those new functionalities are constrained by a centralized approach and managerial complexity. Furthermore, with the rise of Large Language Models (LLMs), a new era in network orchestration and management begins, leveraging and empowering the Intent-based Networking (IBN) paradigm. In addition, AI agents and Agentic AI integrate Reasoning and Acting (ReAct), enabling the usage of such intents to continuously interact with the network. Unlike state-of-the-art approaches that primarily employ Agentic AI to mitigate deployment and configuration complexity in the CN, this paper introduces AgentxGCore, which leverages an Agentic AI-Native layer to extend the 3GPP architecture and enable a system based on the existing APIs across the Beyond Next Generation Core (xGC) domain. This proposal establishes an AI-driven closed-loop for continuous optimization based on real-time information, enabling self-organization and self-adaptation. Our approach involves a multi-agent specialized system, divided into a network planner agent, capable of visualizing the network state and developing a plan to meet the intents, and a network executor, responsible for criticizing and executing the plan. To validate the proposed solution, an environment was built using an open-source CN, heterogeneous datasets, and different LLMs were employed to demonstrate its effectiveness.
Summary / 总结
To meet the stringent requirements of emerging applications and the increasingly complex network management and operation, the Next Generation Mobile Networks (NextG), or 6G, will adopt an AI-native architecture on the Core Network (CN).
KISS: Keeping it Simple and Slotted when Learning to Communicate over Wireless
Authors: Kamil Szczech, Maksymilian Wojnar, Krzysztof Rusek, Katarzyna Kosek-Szott, Szymon Szott
First: 2026-05-29T18:56:52+00:00 · Latest: 2026-05-29T18:56:52+00:00
Abstract
A long-standing challenge in distributed wireless systems is ensuring efficient and fair random channel access. Existing solutions often address specific constraints related to timing, periodicity, or centralization, but they typically rely on fixed heuristics. Motivated by recent advances in machine learning (ML), we investigate whether ML agents can autonomously learn efficient and fair access strategies, and whether such learning can offer new insights into medium access control (MAC) design. Rather than proposing a deployable protocol, our aim is to examine whether decentralized learning can rediscover or approximate theoretically efficient random-access mechanisms under minimal assumptions. To this end, we deploy an off-policy Double Deep Q-Network (DDQN) with Bayesian inference to train agents operating over a slotted channel. The resulting method is fully online (no pre-training), fully distributed (independent multi-agent learners), stochastic (non-periodic), and requires no coordination or explicit communication. Extensive simulations show that the learned strategy adapts to varying network conditions and achieves near-theoretical efficiency while maintaining fairness. Ablation studies further reveal that the learned behavior resembles slotted ALOHA with a dynamically adjusted transmission probability, leading us to refer to the method as KISS: Keeping It Simple and Slotted.
Summary / 总结
A long-standing challenge in distributed wireless systems is ensuring efficient and fair random channel access.