25.78% = 2360 / 9155
CVPR2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR2022), and accepted 2360 papers, for a 25.78% acceptance rate.
注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!
【CVPR 2023 论文开源目录】
Backbone
Integrally Pre-Trained Transformer Pyramid Networks
Stitchable Neural Networks
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
BiFormer: Vision Transformer with Bi-Level Routing Attention
· Paper: None
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
Vision Transformer with Super Token Sampling
Hard Patches Mining for Masked Image Modeling
· Paper: None
· Code: None
SMPConv: Self-moving Point Representations for Continuous Convolution
CLIP
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
MAE
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Generic-to-Specific Distillation of Masked Autoencoders
GAN
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
NeRF
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
· Code: None
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
· Code: None
Panoptic Lifting for 3D Scene Understanding with Neural Fields
· Code: None
NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
· Code: None
HNeRV: A Hybrid Neural Representation for Videos
DETR
DETRs with Hybrid Matching
Prompt
Diversity-Aware Meta Visual Prompting
NAS
PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
Avatars
Structured 3D Features for Reconstructing Relightable and Animatable Avatars
· Code: None
Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
ReID(重识别)
Clothing-Change Feature Augmentation for Person Re-Identification
· Paper: None
· Code: None
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
· Code: None
Diffusion Models(扩散模型)
Video Probabilistic Diffusion Models in Projected Latent Space
Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
· Code: None
Imagic: Text-Based Real Image Editing with Diffusion Models
· Code: None
Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
· Code: None
DiffRF: Rendering-guided 3D Radiance Field Diffusion
· Code: None
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
· Code: None
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
· Code: None
Generative Diffusion Prior for Unified Image Restoration and Enhancement
· Code: None
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
长尾分布(Long-Tail)
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
· Code: None
Vision Transformer
Integrally Pre-Trained Transformer Pyramid Networks
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
· Code: None
Learning Trajectory-Aware Transformer for Video Super-Resolution
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
· Code: None
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
BiFormer: Vision Transformer with Bi-Level Routing Attention
Vision Transformer with Super Token Sampling
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
· Code: None
BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation
· Paper: None
· Code: None
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
· Code: None
视觉和语言(Vision-Language)
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
· Code: None
Teaching Structured Vision&Language Concepts to Vision&Language Models
· Code: None
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
· Code: None
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
· Code: None
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
· Code: None
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
· Code: None
All in One: Exploring Unified Video-Language Pre-training
Position-guided Text Prompt for Vision Language Pre-training
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
· Code: None
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Multi-Modal Representation Learning with Text-Driven Soft Masks
· Code: None
Learning to Name Classes for Vision and Language Models
· Code: None
目标检测(Object Detection)
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
DETRs with Hybrid Matching
Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
目标跟踪(Object Tracking)
Simple Cues Lead to a Strong Multi-Object Tracker
· Code: None
语义分割(Semantic Segmentation)
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
医学图像分割(Medical Image Segmentation)
Label-Free Liver Tumor Segmentation
Directional Connectivity-based Segmentation of Medical Images
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
· Code: None
Fair Federated Medical Image Segmentation via Client Contribution Estimation
Ambiguous Medical Image Segmentation using Diffusion Models
Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
· Paper: None
· Code: None
Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
· Paper: None
· Code: None
Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation
· Paper: None
· Code: None
SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
· Paper: None
· Code: None
视频目标分割(Video Object Segmentation)
Two-shot Video Object Segmentation
Under Video Object Segmentation Section
· Code: None
参考图像分割(Referring Image Segmentation )
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
·
·
·
Code: None
·
3D点云(3D-Point-Cloud)
Physical-World Optical Adversarial Attacks on 3D Face Recognition
IterativePFN: True Iterative Point Cloud Filtering
3D目标检测(3D Object Detection)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
· Code: None
3D Video Object Detection with Learnable Object-Centric Global Optimization
· Paper: None
· Code: None
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
3D语义分割(3D Semantic Segmentation)
Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
Robust Outlier Rejection for 3D Registration with Variational Bayes
Low-level Vision
Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
Burstormer: Burst Image Restoration and Enhancement Transformer
超分辨率(Video Super-Resolution)
Super-Resolution Neural Operator
视频超分辨率
Learning Trajectory-Aware Transformer for Video Super-Resolution
·
·
·
·
图像生成(Image Generation)
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
· Code: None
Few-shot Semantic Image Synthesis with Class Affinity Transfer
· Code: None
TopNet: Transformer-based Object Placement Network for Image Compositing
· Code: None
视频生成(Video Generation)
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
视频理解(Video Understanding)
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Frame Flexible Network
Masked Motion Encoding for Self-Supervised Video Representation Learning
行为检测(Action Detection)
TriDet: Temporal Action Detection with Relative Boundary Modeling
文本检测(Text Detection)
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
知识蒸馏(Knowledge Distillation)
Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
· Code: None
Generic-to-Specific Distillation of Masked Autoencoders
模型剪枝(Model Pruning)
DepGraph: Towards Any Structural Pruning
图像压缩(Image Compression)
Context-Based Trit-Plane Coding for Progressive Image Compression
异常检测(Anomaly Detection)
Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images
三维重建(3D Reconstruction)
OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
· Code: None
SparsePose: Sparse-View Camera Pose Regression and Refinement
· Code: None
NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
· Code: None
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
· Code: None
3D Cinemagraphy from a Single Image
Revisiting Rotation Averaging: Uncertainties and Robust Losses
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
·
·
·
·
·
·
深度估计(Depth Estimation)
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
轨迹预测(Trajectory Prediction)
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
· Code: None
EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
车道线检测(Lane Detection)
Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points
图像描述(Image Captioning)
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
· Code: Node
Cross-Domain Image Captioning with Discriminative Finetuning
· Code: None
Model-Agnostic Gender Debiased Image Captioning
· Code: None
视觉问答(Visual Question Answering)
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
手语识别(Sign Language Recognition)
Continuous Sign Language Recognition with Correlation Network
视频预测(Video Prediction)
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
新视点合成(Novel View Synthesis)
3D Video Loops from Asynchronous Input
Zero-Shot Learning(零样本学习)
Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
Semantic Prompt for Few-Shot Learning
· Paper: None
· Code: None
立体匹配(Stereo Matching)
Iterative Geometry Encoding Volume for Stereo Matching
Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
· Code: None
场景图生成(Scene Graph Generation)
Prototype-based Embedding Network for Scene Graph Generation
· Code: None
隐式神经表示(Implicit Neural Representations)
Polynomial Implicit Neural Representations For Large Diverse Datasets
图像质量评价(Image Quality Assessment)
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
· Code: None
数据集(Datasets)
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
· Code: None
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
CelebV-Text: A Large-Scale Facial Text-Video Dataset
其他(Others)
Interactive Segmentation as Gaussian Process Classification
· Code: None
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
· Code: None
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
· Code: None
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
· Code: None
DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
· Code: None
Token Turing Machines
· Code: None
Single Image Backdoor Inversion via Robust Smoothed Classifiers
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
· Code: None
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
· Code: None
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
· Code: None
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
· Code: None
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
Learning Neural Parametric Head Models
· Code: None
A Meta-Learning Approach to Predicting Performance and Data Requirements
· Code: None
MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
· Code: None
Masked Images Are Counterfactual Samples for Robust Fine-tuning
· Code: None
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
· Code: None
Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization
· Code: None
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
· Code: None
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
· Code: None
UniHCP: A Unified Model for Human-Centric Perceptions
CUDA: Convolution-based Unlearnable Datasets
Masked Images Are Counterfactual Samples for Robust Fine-tuning
· Code: None
AdaptiveMix: Robust Feature Representation via Shrinking Feature Space
Physical-World Optical Adversarial Attacks on 3D Face Recognition
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
· Paper: None
· Code: None
Sharpness-Aware Gradient Matching for Domain Generalization
· Paper: None
Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization
· Paper: None
· Code: None
Blind Video Deflickering by Neural Filtering with a Flawed Atlas
· Paper: None
· Code: None
RiDDLE: Reversible and Diversified De-identification with Latent Encryptor
· Paper: None
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
· Code: None
Upcycling Models under Domain and Category Shift
Modality-Agnostic Debiasing for Single Domain Generalization
· Code: None
Progressive Open Space Expansion for Open-Set Model Attribution
· Code: None
Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies
· Code: None
GFPose: Learning 3D Human Pose Prior with Gradient Fields
PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
· Code: None
Boundary Unlearning
· Code: None
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
Zero-shot Model Diagnosis
· Code: None
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
Quantum Multi-Model Fitting
DivClust: Controlling Diversity in Deep Clustering
· Code: None
Neural Volumetric Memory for Visual Locomotion Control
MonoHuman: Animatable Human Neural Field from Monocular Video
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
· Code: None
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
· Code: None
HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
· Code: None
On the Stability-Plasticity Dilemma of Class-Incremental Learning
· Code: None
Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
· Code: None
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Detecting and Grounding Multi-Modal Media Manipulation
Meta-causal Learning for Single Domain Generalization
· Code: None
Disentangling Writer and Character Styles for Handwriting Generation