Journal of Artificial Intelligence Research

TeamTTA: Efficient Multi-Device Collaboration for Open-Set Test-Time Adaptation via Cloud Integration

2026-01-11T13:39:16+00:00

Deep neural networks (DNNs) deployed on edge devices often suffer from severe performance degradation when exposed to dynamic and continually shifting environments. Test-time adaptation (TTA) has emerged as a promising solution by updating models online with incoming test data. However, edge deployment poses unique challenges: limited computational resources, latency caused by adaptation delays, and knowledge isolation across devices. The situation becomes even more complex in open-world scenarios, where the presence of unknown categories further disrupts adaptation. To overcome these limitations, we propose TeamTTA, a cloud-integrated framework designed for efficient multi-device collaboration open-set test-time adaptation. Specifically, TeamTTA aggregates reliable samples from multiple edge devices through crowdsourcing, uploads them to the cloud, and maintains a memory buffer for continual adaptation. A large vision model (LVM) in the cloud leverages its zero-shot generalization ability to filter out open-set samples and acts as a teacher model, distilling its knowledge into a replicated student edge model stored in the cloud. The adapted model parameters, or alternatively global statistics under poor network conditions, are then transmitted back to the edge devices for efficient inference. Extensive experiments on standard public TTA benchmarks, including corrupted and open-set datasets, show that TeamTTA achieves superior adaptation accuracy, robustness to distribution shifts, and communication efficiency, outperforming state-of-the-art TTA baselines. These results validate the effectiveness of integrating cloud-edge collaboration and LVM-driven knowledge distillation for real-world edge intelligence.

Scaling Neuro-symbolic Problem Solving: Solver-Free Learning of Constraints and Objectives

2025-11-20T01:04:07+00:00

Background: In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimisation problems from natural inputs, a task that Large Language Models seem to struggle with.

Objectives: We introduce a differentiable neuro-symbolic architecture and a loss function dedicated to learning how to solve NP-hard reasoning problems.

Methods: Our new probabilistic loss allows for learning both the constraints and the objective – possibly non-linear – of a combinatorial problem. Thus, it delivers a complete model that can be scrutinized and completed with side constraints. By pushing the combinatorial solver out of the training loop, our architecture also offers scalable training while exact inference gives access to maximum accuracy.

Results: We empirically show that it can efficiently learn how to solve NP-hard reasoning problems from natural inputs. On three variants of the Sudoku benchmark – symbolic, visual, and many-solution –, our approach requires a fraction of data and training time of other hybrid methods. On a visual Min-Cut/Max-cut task, it optimizes the regret as well as a Decision-Focused-Learning regret-dedicated loss. Finally, it efficiently learns the energy optimisation formulation of the large real-world problem of designing proteins.

A Review of Causal Decision Making

2025-11-16T08:16:38+00:00

To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision making through a causal lens: 1) the discovery of causal relationships through causal structure learning, 2) understanding the impacts of these relationships through causal effect learning, and 3) applying the knowledge gained from the first two aspects to support decision making via causal policy learning. Moreover, we identify challenges that hinder the broader utilization of causal decision making and discuss recent advances in overcoming these challenges. Finally, we provide future research directions to address these challenges and further enhance the implementation of causal decision making in practice, with real-world applications illustrated through the proposed causal decision-making workflow. To facilitate broader adoption, we additionally integrate relevant methods into a unified Python-based collection, offering a methodological and practical framework for the community (available at https://causaldm.github.io/Causal-Decision-Making).

Rational Silence and False Polarization: How Viewpoint Organizations and Recommender Systems Distort the Expression of Public Opinion

2025-11-06T15:26:24+00:00

Social media platforms are one of the most important domains in which artificial intelligence (AI) has already transformed the nature of economic and social interaction. AI enables the massive scale and highly personalized nature of online information sharing that we now take for granted. Extensive attention has been devoted to the polarization that social media platforms appear to facilitate. However, a key implication of the transformation we are experiencing due to these AI-powered platforms has received much less attention: how platforms impact what observers of online discourse come to believe about community views. These observers include policymakers and legislators, who look to social media to gauge the prospects for policy and legislative change, as well as developers of AI models trained on large-scale internet data, whose outputs may similarly reflect a distorted view of public opinion. In this paper, we present a nested game-theoretic model to show how observed online opinion is produced by the interaction of the decisions made by users about whether and with what rhetorical intensity to share their opinions on a platform, the efforts of viewpoint organizations (such as traditional media and advocacy organizations) that seek to encourage or discourage opinion-sharing online, and the operation of AI-powered recommender systems controlled by social media platforms. We show that signals from ideological viewpoint organizations encourage an increase in rhetorical intensity, leading to the rational silence of moderate users. This, in turn, creates a polarized impression of where average opinions lie. We also show that this observed polarization can also be amplified by recommender systems that, pursuant to a platform’s incentive to maximize engagement, encourage the formation of viewpoint communities online that end up seeing a skewed sample of opinion. Unlike existing models, these well-known online phenomena are not here attributed to distortion in the formation of opinions nor to the seeking out of like-minded others, but rather to the interaction of the incentives of users, viewpoint organizations, and platforms implementing recommender systems. In addition to showing how these interactions can play out in simulations, we also identify practical strategies platforms can implement, such as reducing exposure to signals from ideological viewpoint organizations and a tailored approach to content moderation.

General Supervised Learning Framework for Open World Classification

2025-11-10T15:48:42+00:00

In open-world supervised learning for classification, the training data is incomplete with respect to the full set of relevant classes in the application domain. Most existing research on this problem focuses on computer vision, and many of the proposed methodologies are intrinsically tied to specific machine learning algorithms or data types. However, real-world open-world settings may arise in a wide array of problem contexts, each with its own data type and classifier requirements. Although existing research emphasizes the identification of unknown sets or classes, it does not sufficiently address automatically categorizing these new classes and updating predictive models. In this work, we present a framework that addresses all aspects of the open world classification pipeline. The proposed approach is data- and model-agnostic, making it versatile across different domains. Our framework performs automatic identification and categorization of unknown instances into distinct new classes while dynamically updating predictive models without human intervention. We evaluate it on diverse data types, including images, text, and sensor data, demonstrating effectiveness across experiments with accuracy improvements ranging from 27 to 69 percentage points. To assess robustness and provide practical guidance, we conduct comprehensive sensitivity analysis examining the impact of key parameters including the number of known classes, the Chebyshev confidence parameter, the itemset size parameter, and base classifier quality. Additionally, we provide insights into practical applications through a case study on social media analytics for disaster response, highlighting the adaptability of the framework in real-world scenarios.

Label-Aware Pseudo-Training Sample Generation for Text Classification

2026-01-10T07:16:18+00:00

Deep learning models excel in various Natural Language Processing (NLP) tasks, but their performance (excluding approaches like zero-shot learning or few-shot learning) relies on ample data, posing challenges in fields with limited datasets. To address the poverty in the size of training data, a number of approaches could be taken, such as multi-task learning and data augmentation. Aiming to leverage Large Language Models (LLMs), we propose a data augmentation algorithm. It subtly alters sentences by inserting random words and utilizes LLMs to find the most fitting replacements within their embedding space. Taking inspiration from Prompt Tuning, the focus shifts from optimizing the input prompt to updating the inserted tokens’ embedding vectors by maximizing the conditional generation probability. This allows for vast sample generation while implicitly benefiting from the knowledge within LLMs. The results from our extensive set of experiments on various benchmark text classification tasks show a substantial improvement over the non-augmented outcomes.

Improving Plan Execution Flexibility using Block-Substitution

2025-10-27T06:23:51+00:00

Partial-order plans in AI planning facilitate execution flexibility due to their less-constrained nature. Maximizing plan flexibility has been studied through the notions of plan deordering, and plan reordering. Plan deordering removes unnecessary action orderings within a plan, while plan reordering modifies them arbitrarily to minimize action orderings. This study, in contrast with traditional plan deordering and reordering strategies, improves a plan’s flexibility by substituting its subplans with actions outside the plan for a planning problem. Our methodology builds on block deordering, which eliminates orderings in a POP by encapsulating coherent actions in blocks, yielding a hierarchically structured plan termed a Block Decomposed Partial-Order (BDPO) plan. We consider the action blocks in a BDPO plan as candidate subplans for substitutions, and ensure that each successful substitution produces a plan with strictly greater flexibility. In addition, this paper employs plan reduction strategies to eliminate redundant actions within a BDPO plan. We also evaluate our approach when combined with MaxSAT-based reorderings. Our experimental result demonstrates a significant improvement in plan execution flexibility on the benchmark problems from International Planning Competitions (IPC), maintaining good coverage and execution time.

Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification

2025-10-22T22:26:16+00:00

We present Probabilistically Tightened Linear Relaxation-based Perturbation Analysis (PT-LiRPA), a novel framework that combines over-approximation techniques from LiRPA-based approaches with a sampling-based method to compute tight intermediate reachable sets. In detail, we show that with negligible computational overhead, PT-LiRPA exploiting the estimated reachable sets, significantly tightens the lower and upper linear bounds of a neural network's output, reducing the computational cost of formal verification tools while providing probabilistic guarantees on verification soundness. Extensive experiments on standard formal verification benchmarks, including the International Verification of Neural Networks Competition, show that our PT-LiRPA-based verifier improves robustness certificates, i.e., the certified lower bound of ε perturbation tolerated by the models, by up to 3.31X and 2.26X compared to related work. Importantly, our probabilistic approach results in a valuable solution for challenging competition entries where state-of-the-art formal verification methods fail, allowing us to provide answers with high confidence (i.e., at least 99%).

Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

2025-12-21T02:54:15+00:00

Background: Trustworthy AI serves as a foundational pillar for two major AI ethics conferences: AIES and FAccT. Current research often adopts techno-centric approaches, focusing primarily on technical attributes such as accuracy, reliability, robustness, and fairness, while overlooking the sociotechnical dimensions critical to understanding AI trustworthiness in real-world contexts.

Objectives: This scoping review aims to examine how the AIES and FAccT communities conceptualize, measure, and validate AI trustworthiness, identifying major gaps and opportunities for advancing a holistic understanding of trustworthy AI systems.

Methods: We conduct a scoping review of the AIES and FAccT conference proceedings to date, systematically analyzing how trustworthiness is defined, operationalized, and applied across different research domains. Our analysis focuses on conceptualization approaches, measurement methods, verification and validation techniques, application areas, and underlying values.

Results: While significant progress has been made in defining technical attributes such as transparency, accountability, and robustness, our findings reveal critical gaps. Current research often predominantly emphasizes technical precision at the expense of social and ethical considerations. The sociotechnical nature of AI systems remains less explored and trustworthiness emerges as a contested concept shaped by those with the power to define it.

Conclusions: An interdisciplinary approach combining technical rigor with social, cultural, and institutional considerations is essential for advancing trustworthy AI. We propose actionable measures for the AI ethics community to adopt holistic frameworks that genuinely address the complex interplay between AI systems and society, ultimately promoting responsible technological development that benefits all stakeholders.

PGB: One-Shot Pruning for BERT via Weight Grouping and Permutation

2025-11-28T00:52:37+00:00

Large pretrained language models such as BERT suffer from slow inference and high memory usage, due to their huge size. Recent approaches to compressing BERT rely on iterative pruning and knowledge distillation, which, however, are often too complicated and computationally intensive. This paper proposes a novel semi-structured one-shot pruning method for BERT, called Permutation and Grouping for BERT (PGB), which achieves high compression efficiency and sparsity while preserving accuracy. To this end, PGB identifies important groups of individual weights by permutation and prunes all other weights as a structure in both multi-head attention and feed-forward layers. Furthermore, if no important group is formed in a particular layer, PGB drops the entire layer to produce an even more compact model. Our experimental results on BERTBASE demonstrate that PGB outperforms the state-of-the-art structured pruning methods in terms of computational cost and accuracy preservation.