Anomaly Detection in Computer Vision
Research portal — papers, people, workshops & frontier trends in AD and broader CV/ML
Sign in to access this content
Research themes, timelines, and deeper dashboard analysis are available after login.
People
Unified researcher profiles
Papers
Anomaly detection papers
Workshops
Workshop recordings & summaries
Research Map
Interactive frontier map
Search
Search across everything
- 1.Static image recognition is no longer the center of gravity; video, 3D consistency, and embodied interaction are.
- 2.Data strategy is becoming as important as model architecture: collection, filtering, attribution, synthetic generation, and evaluation design all matter.
- 3.A younger layer of agenda-setters is pushing the field toward systems thinking: interfaces, search, tools, simulators, and deployment loops matter almost as much as backbone choice.
- 4.The field is converging on world-aware representations, but not on one representation family.
- 5.Large multimodal models are useful today, but senior researchers still disagree on how far ungrounded priors can take us.
- 6.Simulation has become a first-class research instrument for both training and testing, especially in autonomy and physical AI.
- 7.Evaluation remains immature: many current benchmarks still reward fluent pattern completion more than causal, temporal, or physically grounded understanding.
Research Themes
View Research MapThe frontier is shifting from static recognition toward models that maintain a stable sense of 3D scene structure, predict consequences, and support camera or agent interventions.
A recurring view among senior vision researchers is that language-only competence is insufficient for robust intelligence; systems need action-conditioned perception and grounding in the physical world.
The field is increasingly skeptical that raw model scale alone will determine progress. Data curation, provenance, synthetic generation, and attribution are becoming core scientific levers.
Much of the field’s energy has moved beyond image classification into long-form video, audio-visual learning, and multimodal systems that must reason over time rather than over isolated frames.
One camp bets on broad foundation models and compositional toolchains; the other bets on stronger structure, simulation, geometry, and embodiment. The frontier likely belongs to hybrids that can use both.
Workshop Timeline
Sign in to access this content