arXiv cs.CV (Computer Vision)

75 items · Generative Image & Video Models · site ↗

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding arXiv cs.CV (Computer Vision) 8h
NIV: Neural Axis Variations for Variable Font Generation arXiv cs.CV (Computer Vision) 8h
Personal AI Agent for Camera Roll VQA arXiv cs.CV (Computer Vision) 8h
Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation arXiv cs.CV (Computer Vision) 8h
TopoPult-SSL: Gland-Mask-Free Cross-Device Meibomian Gland Segmentation via Self-Distilled Weak Clinical Priors arXiv cs.CV (Computer Vision) 8h
LightVesselNet: An Ultra-Lightweight Sub-100K Parameter Network for Retinal Blood Vessel Segmentation arXiv cs.CV (Computer Vision) 8h
Recovering Physically Plausible Human-Object Interactions from Monocular Videos arXiv cs.CV (Computer Vision) 8h
Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin arXiv cs.CV (Computer Vision) 8h
Three-Dimensional Retinal Microvasculature Restoration in OCT Angiography arXiv cs.CV (Computer Vision) 8h
Deep Learning-assisted AMD Staging based on OCT and OCT Angiography arXiv cs.CV (Computer Vision) 8h
UniPixie: Unified and Probabilistic 3D Physics Learning via Flow Matching arXiv cs.CV (Computer Vision) 8h
Would you still call this Dax? Novel Visual References in VLMs and Humans arXiv cs.CV (Computer Vision) 8h
Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular Classification arXiv cs.CV (Computer Vision) 8h
Horse Eye Blink Detection and Classification for Equine Affective State Assessment arXiv cs.CV (Computer Vision) 8h
ORACLE-CT: Anatomy-Aware Support Pooling for CT Classification arXiv cs.CV (Computer Vision) 8h
Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation arXiv cs.CV (Computer Vision) yest
Weakly Supervised Incremental Segmentation via Semantic Anchors and Spatial Arbitration arXiv cs.CV (Computer Vision) yest
Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning arXiv cs.CV (Computer Vision) yest
Optimal Transport Flow Matching by Design arXiv cs.CV (Computer Vision) yest
When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection arXiv cs.CV (Computer Vision) yest
Reflection Separation from a Single Image via Joint Latent Diffusion arXiv cs.CV (Computer Vision) yest
Pinpoint: Grounded Worldwide Image Geolocation via Cross-Source Retrieval and Reranking arXiv cs.CV (Computer Vision) yest
End-to-End Text Line Detection and Ordering arXiv cs.CV (Computer Vision) yest
GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs arXiv cs.CV (Computer Vision) yest
Spatial Artifact Coherence Determines Codec Robustness in Patch-Based rPPG arXiv cs.CV (Computer Vision) yest
Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1) arXiv cs.CV (Computer Vision) yest
Prospective Dynamic 3D MRI Reconstruction via Latent-Space Motion Tracking from Single Measurement arXiv cs.CV (Computer Vision) yest
SBP-Net: Learning Thin Structure Reconstruction with Sliding-Box Projections arXiv cs.CV (Computer Vision) yest
UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation arXiv cs.CV (Computer Vision) yest
StandardE2E: A Unified Framework for End-to-End Autonomous Driving Datasets arXiv cs.CV (Computer Vision) yest
COD10K-C: Benchmarking Robustness of Camouflaged Object Detection Under Natural Image Corruptions arXiv cs.CV (Computer Vision) Jun 3
AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes arXiv cs.CV (Computer Vision) Jun 3
Consistent Yet Wrong: Evidence Insensitivity in Spatial Vision-Language Models arXiv cs.CV (Computer Vision) Jun 3
Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records arXiv cs.CV (Computer Vision) Jun 3
MetaWorld: Scaling Multi-Agent Video World Model from Single-view Video Data arXiv cs.CV (Computer Vision) Jun 3
From Local Training to Large-Scale Mapping: A Comparative Assessment of Machine Learning and Deep Learning for Transferable Satellite-Derived Bathymetry arXiv cs.CV (Computer Vision) Jun 3
GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving arXiv cs.CV (Computer Vision) Jun 3
Diagnosis of Human Object Interaction Detectors for Real World Educational Applications arXiv cs.CV (Computer Vision) Jun 3
Cosmos 3: Omnimodal World Models for Physical AI arXiv cs.CV (Computer Vision) Jun 3
Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging arXiv cs.CV (Computer Vision) Jun 3
Principled Reflection Separation via Nonlinear Superposition and Feature Interaction arXiv cs.CV (Computer Vision) Jun 3
Pathway-Structured Privileged Distillation for Deployable Computational Pathology arXiv cs.CV (Computer Vision) Jun 3
Tiny Collaborative Inference for Occlusion-Robust Object Detection arXiv cs.CV (Computer Vision) Jun 3
Any2Poster: Any-Source Poster Generation Across Modalities and Domains arXiv cs.CV (Computer Vision) Jun 3
Pixel Cube: Diffusion-based Portrait Video Relighting Through Realistic Lighting Reproduction arXiv cs.CV (Computer Vision) Jun 3
DefocusTrackerAI -- A Generalized Framework for the Automatic Detection of Defocused Particle Images arXiv cs.CV (Computer Vision) Jun 2
Improved Belief-Attention in Vision Task arXiv cs.CV (Computer Vision) Jun 2
Flow-Based Generative Modeling for Optimizing Sampling Policies in Compressed Sensing Applications arXiv cs.CV (Computer Vision) Jun 2
Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems arXiv cs.CV (Computer Vision) Jun 2
Structured Visual Evidence Decomposition for Evidence-Grounded Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome arXiv cs.CV (Computer Vision) Jun 2
Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization arXiv cs.CV (Computer Vision) Jun 2
Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry arXiv cs.CV (Computer Vision) Jun 2
Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation arXiv cs.CV (Computer Vision) Jun 2
Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents arXiv cs.CV (Computer Vision) Jun 2
Segmentation-Guided Spatial Indexing for Generalizable and Explainable Deepfake Detection arXiv cs.CV (Computer Vision) Jun 2
CoilDrop-MRI: Self-supervised physics-guided MRI reconstruction with coil dropout arXiv cs.CV (Computer Vision) Jun 2
CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection arXiv cs.CV (Computer Vision) Jun 2
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning arXiv cs.CV (Computer Vision) Jun 2
VDSB-GWSyn: Diffusion Schr\"{o}dinger Bridge for Controllable and Anatomically Feasible Guidewire Synthesis in Coronary Angiography arXiv cs.CV (Computer Vision) Jun 2
General Covariant Action Modeling: Constructing Generalized Manifolds via Spatio-Temporal Decoupling arXiv cs.CV (Computer Vision) Jun 2
Lightweight SAR Ship Detection via Contrastive Distillation arXiv cs.CV (Computer Vision) Jun 1
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer arXiv cs.CV (Computer Vision) Jun 1
DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution arXiv cs.CV (Computer Vision) Jun 1
Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement arXiv cs.CV (Computer Vision) Jun 1
Dex2HOI: Dexterous Bimanual Two-Object Interaction Generation arXiv cs.CV (Computer Vision) Jun 1
Clustering Guided Domain-Specific Pretrained Foundation Model Very High-Resolution Arctic Remote Sensing arXiv cs.CV (Computer Vision) Jun 1
A Novel Global Context-aware Deep Neural Network for Enhanced Brain Tumor Segmentation using Magnetic Resonance Images arXiv cs.CV (Computer Vision) Jun 1
OmniMem: Scalable and Adaptive Memory Retrieval for Long Video Generation arXiv cs.CV (Computer Vision) Jun 1
On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection arXiv cs.CV (Computer Vision) Jun 1
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)? arXiv cs.CV (Computer Vision) Jun 1
VLM3: Vision Language Models Are Native 3D Learners arXiv cs.CV (Computer Vision) Jun 1
Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes arXiv cs.CV (Computer Vision) Jun 1
ReGuLaR: Relation-Grounded Latent Reasoning for Large Vision-Language Models arXiv cs.CV (Computer Vision) Jun 1
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs arXiv cs.CV (Computer Vision) Jun 1
Controllable Lung Nodule Synthesis via Histogram-Regularized Latent Diffusion Models arXiv cs.CV (Computer Vision) Jun 1

Keyboard

j / k
move between items
Space
expand / collapse
o
open original
s
save / unsave
m
mark read
/
focus search
?
this help