RoadblockArtificial IntelligenceProgressing

Unified multimodal understanding

Current vision-language models can describe images and answer questions about them, but struggle with fine-grained spatial reasoning, temporal understanding in video, and genuine cross-modal inference. Unified architectures that natively process text, images, audio, and video remain inferior to specialized models in many benchmarks. Achieving human-level multimodal understanding that seamlessly integrates perception across modalities — including physical intuition and commonsense spatial reasoning — is an open challenge.

Uncertainty analysis in digital twins and integration of aleatory uncertainties for virtual entity models

June 10, 2026openalex

Digital Ghost

June 10, 2026openalex

Digital Ghost

June 10, 2026openalex

G-SENSE: Generalized Sensorless External Force Estimation for Humanoid Robots via Centroidal Dynamics

June 10, 2026openalex

Human-Centred Guidance in an AI-Driven Labour Market: New Roles, New Responsibilities

June 10, 2026openalex

Human-Centred Guidance in an AI-Driven Labour Market: New Roles, New Responsibilities

June 10, 2026openalex

Unified multimodal understanding

Sustainable Green Computing and Carbon-Aware Artificial Intelligence

Sustainable Green Computing and Carbon-Aware Artificial Intelligence

Scoping Review of Machine Learning Frameworks for Climate Projection and Adaptation Planning in Senegal

Scoping Review of Machine Learning Frameworks for Climate Projection and Adaptation Planning in Senegal

Uncertainty analysis in digital twins and integration of aleatory uncertainties for virtual entity models

Digital Ghost

Digital Ghost

G-SENSE: Generalized Sensorless External Force Estimation for Humanoid Robots via Centroidal Dynamics

Human-Centred Guidance in an AI-Driven Labour Market: New Roles, New Responsibilities

Human-Centred Guidance in an AI-Driven Labour Market: New Roles, New Responsibilities