RoadblockArtificial IntelligencePartial

AI alignment and value alignment

Current methods for aligning large language models with human values — RLHF, DPO, constitutional AI — remain brittle and do not scale reliably. Models can exhibit reward hacking, sycophancy, and deceptive alignment, where surface behavior appears aligned while internal objectives diverge. Scalable oversight of superhuman systems, robust value specification, and corrigibility guarantees are unsolved. The gap between behavioral compliance and genuine alignment widens as model capabilities increase.

AI alignment and value alignment

Knowing the Self, Understanding the World: A Dual-Cognition Benchmark for UAV Spatio-temporal Reasoning with MLLMs

FVAttn: Adaptive Sparse Attention with Runtime Load Balancing for Video Generation

PagedWeight: Efficient MoE LLM Serving with Dynamic Quality-Aware Weight Quantization

A Blueprint for Equilibrium-Based Differentiable Continuous-Variable Thermodynamic Computing

Vision-Language Assistant for Emotional Reactions to Risky Driving

Cluster-Aware Matching via Laplacian Optimal Transport

Physics-enhanced reinforcement learning for real-time optimal control of dynamical systems

Evaluating Open-Weight LLMs for Generating Structured Threat Information for Autonomous Vehicle Vulnerabilities

Vision-Language-Motion Maps: An Open-Vocabulary, Uncertainty-Aware, Queryable Motion Attribute for 3D Scene Maps

When Does Muon Help Agentic Reinforcement Learning?