超炫剪 发表于 2023-5-19 16:35:47

CVPR 2023 论文和开源项目合集(Papers with Code)

CVPR 2023 论文和开源项目合集(papers with code)!25.78% = 2360 / 9155CVPR2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR2022), and accepted 2360 papers, for a 25.78% acceptance rate.注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision· CVPR 2019· CVPR 2020· CVPR 2021· CVPR 2022
【CVPR 2023 论文开源目录】· Backbone· CLIP· MAE· GAN· GNN· MLP· NAS· OCR· NeRF· DETR· Prompt· Diffusion Models(扩散模型)· Avatars· ReID(重识别)· 长尾分布(Long-Tail)· Vision Transformer· 视觉和语言(Vision-Language)· 自监督学习(Self-supervised Learning)· 数据增强(Data Augmentation)· 目标检测(Object Detection)· 目标跟踪(Visual Tracking)· 语义分割(Semantic Segmentation)· 实例分割(Instance Segmentation)· 全景分割(Panoptic Segmentation)· 医学图像分割(Medical Image Segmentation)· 视频目标分割(Video Object Segmentation)· 参考图像分割(Referring Image Segmentation)· 图像抠图(Image Matting)· 图像编辑(Image Editing)· Low-level Vision· 超分辨率(Super-Resolution)· 去模糊(Deblur)· 3D点云(3D Point Cloud)· 3D目标检测(3D Object Detection)· 3D语义分割(3D Semantic Segmentation)· 3D目标跟踪(3D Object Tracking)· 3D人体姿态估计(3D Human Pose Estimation)· 3D语义场景补全(3D Semantic Scene Completion)· 3D配准(3D Registration)· 医学图像(Medical Image)· 图像生成(Image Generation)· 视频生成(Video Generation)· 视频理解(Video Understanding)· 行为检测(Action Detection)· 文本检测(Text Detection)· 知识蒸馏(Knowledge Distillation)· 模型剪枝(Model Pruning)· 图像压缩(Image Compression)· 异常检测(Anomaly Detection)· 三维重建(3D Reconstruction)· 深度估计(Depth Estimation)· 轨迹预测(Trajectory Prediction)· 车道线检测(Lane Detection)· 图像描述(Image Captioning)· 视觉问答(Visual Question Answering)· 手语识别(Sign Language Recognition)· 视频预测(Video Prediction)· 新视点合成(Novel View Synthesis)· Zero-Shot Learning(零样本学习)· 立体匹配(Stereo Matching)· 场景图生成(Scene Graph Generation)· 隐式神经表示(Implicit Neural Representations)· 图像质量评价(Image Quality Assessment)· 数据集(Datasets)· 新任务(New Tasks)· 其他(Others)

BackboneIntegrally Pre-Trained Transformer Pyramid Networks· Paper: https://arxiv.org/abs/2211.12735· Code: https://github.com/sunsmarterjie/iTPNStitchable Neural Networks· Homepage: https://snnet.github.io/· Paper: https://arxiv.org/abs/2302.06586· Code: https://github.com/ziplab/SN-NetRun, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks· Paper: https://arxiv.org/abs/2303.03667· Code: https://github.com/JierunChen/FasterNetBiFormer: Vision Transformer with Bi-Level Routing Attention· Paper: None· Code: https://github.com/rayleizhu/BiFormerDeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network· Paper: https://arxiv.org/abs/2303.02165· Code: https://github.com/alibaba/lightweight-neural-architecture-searchVision Transformer with Super Token Sampling· Paper: https://arxiv.org/abs/2211.11167· Code: https://github.com/hhb072/SViTHard Patches Mining for Masked Image Modeling· Paper: None· Code: NoneSMPConv: Self-moving Point Representations for Continuous Convolution· Paper: https://arxiv.org/abs/2304.02330· Code: https://github.com/sangnekim/SMPConv

CLIPGALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis· Paper: https://arxiv.org/abs/2301.12959· Code: https://github.com/tobran/GALIPDeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation· Paper: https://arxiv.org/abs/2303.06285· Code: https://github.com/Yueming6568/DeltaEdit

MAELearning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders· Paper: https://arxiv.org/abs/2212.06785· Code: https://github.com/ZrrSkywalker/I2P-MAEGeneric-to-Specific Distillation of Masked Autoencoders· Paper: https://arxiv.org/abs/2302.14771· Code: https://github.com/pengzhiliang/G2SD

GANDeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation· Paper: https://arxiv.org/abs/2303.06285· Code: https://github.com/Yueming6568/DeltaEdit

NeRFNoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior· Home: https://nope-nerf.active.vision/· Paper: https://arxiv.org/abs/2212.07388· Code: NoneLatent-NeRF for Shape-Guided Generation of 3D Shapes and Textures· Paper: https://arxiv.org/abs/2211.07600· Code: https://github.com/eladrich/latent-nerfNeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis· Paper: https://arxiv.org/abs/2301.08556· Code: NonePanoptic Lifting for 3D Scene Understanding with Neural Fields· Homepage: https://nihalsid.github.io/panoptic-lifting/· Paper: https://arxiv.org/abs/2212.09802· Code: NoneNeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer· Homepage: https://redrock303.github.io/nerflix/· Paper: https://arxiv.org/abs/2303.06919· Code: NoneHNeRV: A Hybrid Neural Representation for Videos· Homepage: https://haochen-rye.github.io/HNeRV· Paper: https://arxiv.org/abs/2304.02633· Code: https://github.com/haochen-rye/HNeRV

DETRDETRs with Hybrid Matching· Paper: https://arxiv.org/abs/2207.13080· Code: https://github.com/HDETR

PromptDiversity-Aware Meta Visual Prompting· Paper: https://arxiv.org/abs/2303.08138· Code: https://github.com/shikiw/DAM-VP

NASPA&DA: Jointly Sampling PAth and DAta for Consistent NAS· Paper: https://arxiv.org/abs/2302.14772· Code: https://github.com/ShunLu91/PA-DA

AvatarsStructured 3D Features for Reconstructing Relightable and Animatable Avatars· Homepage: https://enriccorona.github.io/s3f/· Paper: https://arxiv.org/abs/2212.06820· Code: None· Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2sLearning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos· Homepage: https://augmentedperception.github.io/monoavatar/· Paper: https://arxiv.org/abs/2304.01436

ReID(重识别)Clothing-Change Feature Augmentation for Person Re-Identification· Paper: None· Code: NoneMSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID· Paper: https://arxiv.org/abs/2303.07065· Code: https://github.com/vimar-gu/MSINetShape-Erased Feature Learning for Visible-Infrared Person Re-Identification· Paper: https://arxiv.org/abs/2304.04205· Code: None

Diffusion Models(扩散模型)Video Probabilistic Diffusion Models in Projected Latent Space· Homepage: https://sihyun.me/PVDM/· Paper: https://arxiv.org/abs/2302.07685· Code: https://github.com/sihyun-yu/PVDMSolving 3D Inverse Problems using Pre-trained 2D Diffusion Models· Paper: https://arxiv.org/abs/2211.10655· Code: NoneImagic: Text-Based Real Image Editing with Diffusion Models· Homepage: https://imagic-editing.github.io/· Paper: https://arxiv.org/abs/2210.09276· Code: NoneParallel Diffusion Models of Operator and Image for Blind Inverse Problems· Paper: https://arxiv.org/abs/2211.10656· Code: NoneDiffRF: Rendering-guided 3D Radiance Field Diffusion· Homepage: https://sirwyver.github.io/DiffRF/· Paper: https://arxiv.org/abs/2212.01206· Code: NoneMM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation· Paper: https://arxiv.org/abs/2212.09478· Code: https://github.com/researchmm/MM-DiffusionHouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising· Homepage: https://aminshabani.github.io/housediffusion/· Paper: https://arxiv.org/abs/2211.13287· Code: https://github.com/aminshabani/house_diffusionTrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets· Paper: https://arxiv.org/abs/2303.05762· Code: https://github.com/chenweixin107/TrojDiffBack to the Source: Diffusion-Driven Adaptation to Test-Time Corruption· Paper: https://arxiv.org/abs/2207.03442· Code: https://github.com/shiyegao/DDADR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration· Paper: https://arxiv.org/abs/2303.06885· Code: NoneTrace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion· Homepage: https://nv-tlabs.github.io/trace-pace/· Paper: https://arxiv.org/abs/2304.01893· Code: NoneGenerative Diffusion Prior for Unified Image Restoration and Enhancement· Paper: https://arxiv.org/abs/2304.01247· Code: NoneConditional Image-to-Video Generation with Latent Flow Diffusion Models· Paper: https://arxiv.org/abs/2303.13744· Code: https://github.com/nihaomiao/CVPR23_LFDM

长尾分布(Long-Tail)Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation· Paper: https://arxiv.org/abs/2304.01279· Code: None

Vision TransformerIntegrally Pre-Trained Transformer Pyramid Networks· Paper: https://arxiv.org/abs/2211.12735· Code: https://github.com/sunsmarterjie/iTPNMask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors· Homepage: https://niessnerlab.org/projects/hou2023mask3d.html· Paper: https://arxiv.org/abs/2302.14746· Code: NoneLearning Trajectory-Aware Transformer for Video Super-Resolution· Paper: https://arxiv.org/abs/2204.04216· Code: https://github.com/researchmm/TTVSRVision Transformers are Parameter-Efficient Audio-Visual Learners· Homepage: https://yanbo.ml/project_page/LAVISH/· Code: https://github.com/GenjiB/LAVISHWhere We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes· Paper: https://arxiv.org/abs/2303.04249· Code: NoneDSVT: Dynamic Sparse Voxel Transformer with Rotated Sets· Paper: https://arxiv.org/abs/2301.06051· Code: https://github.com/Haiyang-W/DSVTDeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting· Paper: https://arxiv.org/abs/2211.10772· Code link: https://github.com/ViTAE-Transformer/DeepSoloBiFormer: Vision Transformer with Bi-Level Routing Attention· Paper: https://arxiv.org/abs/2303.08810· Code: https://github.com/rayleizhu/BiFormerVision Transformer with Super Token Sampling· Paper: https://arxiv.org/abs/2211.11167· Code: https://github.com/hhb072/SViTBEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision· Paper: https://arxiv.org/abs/2211.10439· Code: NoneBAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation· Paper: None· Code: NoneVisual Dependency Transformers: Dependency Tree Emerges from Reversed Attention· Paper: https://arxiv.org/abs/2304.03282· Code: None

视觉和语言(Vision-Language)GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods· Paper: https://arxiv.org/abs/2301.01893· Code: NoneTeaching Structured Vision&Language Concepts to Vision&Language Models· Paper: https://arxiv.org/abs/2211.11733· Code: NoneUni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks· Paper: https://arxiv.org/abs/2211.09808· Code: https://github.com/fundamentalvision/Uni-PerceiverTowards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training· Paper: https://arxiv.org/abs/2303.00040· Code: NoneCapDet: Unifying Dense Captioning and Open-World Detection Pretraining· Paper: https://arxiv.org/abs/2303.02489· Code: NoneFAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks· Paper: https://arxiv.org/abs/2303.02483· Code: NoneMeta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding· Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html· Paper: https://arxiv.org/abs/2303.04077· Code: NoneAll in One: Exploring Unified Video-Language Pre-training· Paper: https://arxiv.org/abs/2203.07303· Code: https://github.com/showlab/all-in-onePosition-guided Text Prompt for Vision Language Pre-training· Paper: https://arxiv.org/abs/2212.09737· Code: https://github.com/sail-sg/ptpEDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding· Paper: https://arxiv.org/abs/2209.14941· Code: https://github.com/yanmin-wu/EDACapDet: Unifying Dense Captioning and Open-World Detection Pretraining· Paper: https://arxiv.org/abs/2303.02489· Code: NoneFAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks· Paper: https://arxiv.org/abs/2303.02483· Code: https://github.com/BrandonHanx/FAME-ViLAlign and Attend: Multimodal Summarization with Dual Contrastive Losses· Homepage: https://boheumd.github.io/A2Summ/· Paper: https://arxiv.org/abs/2303.07284· Code: https://github.com/boheumd/A2SummMulti-Modal Representation Learning with Text-Driven Soft Masks· Paper: https://arxiv.org/abs/2304.00719· Code: NoneLearning to Name Classes for Vision and Language Models· Paper: https://arxiv.org/abs/2304.01830· Code: None

目标检测(Object Detection)YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors· Paper: https://arxiv.org/abs/2207.02696· Code: https://github.com/WongKinYiu/yolov7DETRs with Hybrid Matching· Paper: https://arxiv.org/abs/2207.13080· Code: https://github.com/HDETREnhanced Training of Query-Based Object Detection via Selective Query Recollection· Paper: https://arxiv.org/abs/2212.07593· Code: https://github.com/Fangyi-Chen/SQRObject-Aware Distillation Pyramid for Open-Vocabulary Object Detection· Paper: https://arxiv.org/abs/2303.05892· Code: https://github.com/LutingWang/OADP

目标跟踪(Object Tracking)Simple Cues Lead to a Strong Multi-Object Tracker· Paper: https://arxiv.org/abs/2206.04656· Code: None

语义分割(Semantic Segmentation)Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos· Paper: https://arxiv.org/abs/2303.07224· Code: https://github.com/THU-LYJ-Lab/AR-SegFREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding· Paper: https://arxiv.org/abs/2304.02135· Code: https://github.com/uark-cviu/FREDOM

医学图像分割(Medical Image Segmentation)Label-Free Liver Tumor Segmentation· Paper: https://arxiv.org/abs/2303.14869· Code: https://github.com/MrGiovanni/SyntheticTumorsDirectional Connectivity-based Segmentation of Medical Images· Paper: https://arxiv.org/abs/2304.00145· Code: https://github.com/Zyun-Y/DconnNetBidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation· Paper: https://arxiv.org/abs/2305.00673· Code: https://github.com/DeepMed-Lab-ECNU/BCPDevil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization· Paper: https://arxiv.org/abs/2304.00212· Code: NoneFair Federated Medical Image Segmentation via Client Contribution Estimation· Paper: https://arxiv.org/abs/2303.16520· Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ceAmbiguous Medical Image Segmentation using Diffusion Models· Homepage: https://aimansnigdha.github.io/cimd/· Paper: https://arxiv.org/abs/2304.04745· Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-ModelsOrthogonal Annotation Benefits Barely-supervised Medical Image Segmentation· Paper: https://arxiv.org/abs/2303.13090· Code: https://github.com/HengCai-NJU/DeSCOMagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery· Paper: https://arxiv.org/abs/2301.01767· Code: https://github.com/DeepMed-Lab-ECNU/MagicNetMCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation· Paper: None· Code: NoneRethinking Few-Shot Medical Segmentation: A Vector Quantization View· Paper: None· Code: NonePseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation· Paper: None· Code: NoneSDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation· Paper: None· Code: None

视频目标分割(Video Object Segmentation)Two-shot Video Object Segmentation· Paper: https://arxiv.org/abs/2303.12078· Code: https://github.com/yk-pku/Two-shot-Video-Object-SegmentationUnder Video Object Segmentation Section· Paper: https://arxiv.org/abs/2303.07815· Code: None

参考图像分割(Referring Image Segmentation )PolyFormer: Referring Image Segmentation as Sequential Polygon Generation· Paper: https://arxiv.org/abs/2302.07387· · Code: None·

3D点云(3D-Point-Cloud)Physical-World Optical Adversarial Attacks on 3D Face Recognition· Paper: https://arxiv.org/abs/2205.13412· Code: https://github.com/PolyLiYJ/SLAttack.gitIterativePFN: True Iterative Point Cloud Filtering· Paper: https://arxiv.org/abs/2304.01529· Code: https://github.com/ddsediri/IterativePFN

3D目标检测(3D Object Detection)DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets· Paper: https://arxiv.org/abs/2301.06051· Code: https://github.com/Haiyang-W/DSVTFrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection· Paper: https://arxiv.org/abs/2301.04467· Code: None3D Video Object Detection with Learnable Object-Centric Global Optimization· Paper: None· Code: NoneHierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection· Paper: https://arxiv.org/abs/2304.01464· Code: https://github.com/azhuantou/HSSDA

3D语义分割(3D Semantic Segmentation)Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation· Paper: https://arxiv.org/abs/2303.11203· Code: https://github.com/l1997i/lim3d

3D语义场景补全(3D Semantic Scene Completion)· Paper: https://arxiv.org/abs/2302.12251· Code: https://github.com/NVlabs/VoxFormer

3D配准(3D Registration)Robust Outlier Rejection for 3D Registration with Variational Bayes· Paper: https://arxiv.org/abs/2304.01514· Code: https://github.com/Jiang-HB/VBReg

Low-level VisionCausal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective· Paper: https://arxiv.org/abs/2303.06859· Code: https://github.com/lixinustc/Casual-IR-DILBurstormer: Burst Image Restoration and Enhancement Transformer· Paper: https://arxiv.org/abs/2304.01194· Code: http://github.com/akshaydudhane16/Burstormer

超分辨率(Video Super-Resolution)Super-Resolution Neural Operator· Paper: https://arxiv.org/abs/2303.02584· Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator
视频超分辨率Learning Trajectory-Aware Transformer for Video Super-Resolution· Paper: https://arxiv.org/abs/2204.04216· · Code: https://github.com/researchmm/TTVSR·

图像生成(Image Generation)GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis· Paper: https://arxiv.org/abs/2301.12959· Code: https://github.com/tobran/GALIPMAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis· Paper: https://arxiv.org/abs/2211.09117· Code: https://github.com/LTH14/mageToward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation· Paper: https://arxiv.org/abs/2304.01816· Code: NoneFew-shot Semantic Image Synthesis with Class Affinity Transfer· Paper: https://arxiv.org/abs/2304.02321· Code: NoneTopNet: Transformer-based Object Placement Network for Image Compositing· Paper: https://arxiv.org/abs/2304.03372· Code: None

视频生成(Video Generation)MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation· Paper: https://arxiv.org/abs/2212.09478· Code: https://github.com/researchmm/MM-DiffusionConditional Image-to-Video Generation with Latent Flow Diffusion Models· Paper: https://arxiv.org/abs/2303.13744· Code: https://github.com/nihaomiao/CVPR23_LFDM

视频理解(Video Understanding)Learning Transferable Spatiotemporal Representations from Natural Script Knowledge· Paper: https://arxiv.org/abs/2209.15280· Code: https://github.com/TencentARC/TVTSFrame Flexible Network· Paper: https://arxiv.org/abs/2303.14817· Code: https://github.com/BeSpontaneous/FFNMasked Motion Encoding for Self-Supervised Video Representation Learning· Paper: https://arxiv.org/abs/2210.06096· Code: https://github.com/XinyuSun/MME

行为检测(Action Detection)TriDet: Temporal Action Detection with Relative Boundary Modeling· Paper: https://arxiv.org/abs/2303.07347· Code: https://github.com/dingfengshi/TriDet

文本检测(Text Detection)DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting· Paper: https://arxiv.org/abs/2211.10772· Code link: https://github.com/ViTAE-Transformer/DeepSolo

知识蒸馏(Knowledge Distillation)Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation· Paper: https://arxiv.org/abs/2302.14290· Code: NoneGeneric-to-Specific Distillation of Masked Autoencoders· Paper: https://arxiv.org/abs/2302.14771· Code: https://github.com/pengzhiliang/G2SD

模型剪枝(Model Pruning)DepGraph: Towards Any Structural Pruning· Paper: https://arxiv.org/abs/2301.12900· Code: https://github.com/VainF/Torch-Pruning

图像压缩(Image Compression)Context-Based Trit-Plane Coding for Progressive Image Compression· Paper: https://arxiv.org/abs/2303.05715· Code: https://github.com/seungminjeon-github/CTC

异常检测(Anomaly Detection)Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images· Paper: https://arxiv.org/abs/2111.13495· Code: https://github.com/tiangexiang/SQUID

三维重建(3D Reconstruction)OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields· Paper: https://arxiv.org/abs/2211.12886· Code: NoneSparsePose: Sparse-View Camera Pose Regression and Refinement· Paper: https://arxiv.org/abs/2211.16991· Code: NoneNeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction· Paper: https://arxiv.org/abs/2303.02375· Code: NoneVid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition· Homepage: https://moygcc.github.io/vid2avatar/· Paper: https://arxiv.org/abs/2302.11566· Code: https://github.com/MoyGcc/vid2avatar· Demo: https://youtu.be/EGi47YeIeGQTo fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision· Paper: https://arxiv.org/abs/2106.09614· Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFAStructural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction· Paper: https://arxiv.org/abs/2303.05937· Code: None3D Cinemagraphy from a Single Image· Homepage: https://xingyi-li.github.io/3d-cinemagraphy/· Paper: https://arxiv.org/abs/2303.05724· Code: https://github.com/xingyi-li/3d-cinemagraphyRevisiting Rotation Averaging: Uncertainties and Robust Losses· Paper: https://arxiv.org/abs/2303.05195· Code https://github.com/zhangganlin/GlobalSfMpyFFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction· Paper: https://arxiv.org/abs/2211.13874· Code: https://github.com/csbhr/FFHQ-UVA Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images· Homepage: https://younglbw.github.io/HRN-homepage/· · Paper: https://arxiv.org/abs/2302.14434· · Code: https://github.com/youngLBW/HRN·

深度估计(Depth Estimation)Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation· Paper: https://arxiv.org/abs/2211.13202· Code: https://github.com/noahzn/Lite-Mono

轨迹预测(Trajectory Prediction)IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction· Paper: https://arxiv.org/abs/2303.00575· Code: NoneEqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning· Paper: https://arxiv.org/abs/2303.10876· Code: https://github.com/MediaBrain-SJTU/EqMotion

车道线检测(Lane Detection)Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection· Paper: https://arxiv.org/abs/2301.02371· Code: https://github.com/tusen-ai/Anchor3DLaneBEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points· Paper: https://arxiv.org/abs/2210.06006v3· Code: https://github.com/gigo-team/bev_lane_det

图像描述(Image Captioning)ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing· Paper: https://arxiv.org/abs/2303.02437· Code: NodeCross-Domain Image Captioning with Discriminative Finetuning· Paper: https://arxiv.org/abs/2304.01662· Code: NoneModel-Agnostic Gender Debiased Image Captioning· Paper: https://arxiv.org/abs/2304.03693· Code: None

视觉问答(Visual Question Answering)MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering· Paper: https://arxiv.org/abs/2303.01239· Code: https://github.com/jingjing12110/MixPHM

手语识别(Sign Language Recognition)Continuous Sign Language Recognition with Correlation NetworkPaper: https://arxiv.org/abs/2303.03202Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)MOSO: Decomposing MOtion, Scene and Object for Video Prediction· Paper: https://arxiv.org/abs/2303.03684· Code: https://github.com/anonymous202203/MOSO

新视点合成(Novel View Synthesis)3D Video Loops from Asynchronous Input· Homepage: https://limacv.github.io/VideoLoop3D_web/· Paper: https://arxiv.org/abs/2303.05312· Code: https://github.com/limacv/VideoLoop3D

Zero-Shot Learning(零样本学习)Bi-directional Distribution Alignment for Transductive Zero-Shot Learning· Paper: https://arxiv.org/abs/2303.08698· Code: https://github.com/Zhicaiwww/Bi-VAEGANSemantic Prompt for Few-Shot Learning· Paper: None· Code: None

立体匹配(Stereo Matching)Iterative Geometry Encoding Volume for Stereo Matching· Paper: https://arxiv.org/abs/2303.06615· Code: https://github.com/gangweiX/IGEVLearning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation· Paper: https://arxiv.org/abs/2304.00152· Code: None

场景图生成(Scene Graph Generation)Prototype-based Embedding Network for Scene Graph Generation· Paper: https://arxiv.org/abs/2303.07096· Code: None

隐式神经表示(Implicit Neural Representations)Polynomial Implicit Neural Representations For Large Diverse Datasets· Paper: https://arxiv.org/abs/2303.11424· Code: https://github.com/Rajhans0/Poly_INR

图像质量评价(Image Quality Assessment)Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild· Paper: https://arxiv.org/abs/2304.00451· Code: None

数据集(Datasets)Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes· Paper: https://arxiv.org/abs/2303.02760· Code: NoneAlign and Attend: Multimodal Summarization with Dual Contrastive Losses· Homepage: https://boheumd.github.io/A2Summ/· Paper: https://arxiv.org/abs/2303.07284· Code: https://github.com/boheumd/A2SummGeoNet: Benchmarking Unsupervised Adaptation across Geographies· Homepage: https://tarun005.github.io/GeoNet/· Paper: https://arxiv.org/abs/2303.15443CelebV-Text: A Large-Scale Facial Text-Video Dataset· Homepage: https://celebv-text.github.io/· Paper: https://arxiv.org/abs/2303.14717

其他(Others)Interactive Segmentation as Gaussian Process Classification· Paper: https://arxiv.org/abs/2302.14578· Code: NoneBackdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger· Paper: https://arxiv.org/abs/2302.14677· Code: NoneSplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries· Homepage: http://bit.ly/splinecam· Paper: https://arxiv.org/abs/2302.12828· Code: NoneSCOTCH and SODA: A Transformer Video Shadow Detection Framework· Paper: https://arxiv.org/abs/2211.06885· Code: NoneDeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization· Homepage: https://ai4ce.github.io/DeepMapping2/· Paper: https://arxiv.org/abs/2212.06331· None: https://github.com/ai4ce/DeepMapping2RelightableHands: Efficient Neural Relighting of Articulated Hand Models· Homepage: https://sh8.io/#/relightable_hands· Paper: https://arxiv.org/abs/2302.04866· Code: NoneToken Turing Machines· Paper: https://arxiv.org/abs/2211.09119· Code: NoneSingle Image Backdoor Inversion via Robust Smoothed Classifiers· Paper: https://arxiv.org/abs/2303.00215· Code: https://github.com/locuslab/smoothinvTo fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision· Paper: https://arxiv.org/abs/2106.09614· Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFAHOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics· Homepage: https://dolorousrtur.github.io/hood/· Paper: https://arxiv.org/abs/2212.07242· Code: https://github.com/dolorousrtur/hood· Demo: https://www.youtube.com/watch?v=cBttMDPrUYYA Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others· Paper: https://arxiv.org/abs/2212.04825· Code: https://github.com/facebookresearch/Whac-A-Mole.gitRelightableHands: Efficient Neural Relighting of Articulated Hand Models· Homepage: https://sh8.io/#/relightable_hands· Paper: https://arxiv.org/abs/2302.04866· Code: None· Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation· Paper: https://arxiv.org/abs/2303.00914· Code: NoneDemystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression· Paper: https://arxiv.org/abs/2303.01052· Code: NoneUniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy· Paper: https://arxiv.org/abs/2303.00938· Code: NoneDisentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness· Paper: https://arxiv.org/abs/2303.00971· Code: https://github.com/zhijieshen-bjtu/DOPNetLearning Neural Parametric Head Models· Homepage: https://simongiebenhain.github.io/NPHM)· Paper: https://arxiv.org/abs/2212.02761· Code: NoneA Meta-Learning Approach to Predicting Performance and Data Requirements· Paper: https://arxiv.org/abs/2303.01598· Code: NoneMACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision· Homepage: https://imagine.enpc.fr/~guedona/MACARONS/· Paper: https://arxiv.org/abs/2303.03315· Code: NoneMasked Images Are Counterfactual Samples for Robust Fine-tuning· Paper: https://arxiv.org/abs/2303.03052· Code: NoneHairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling· Paper: https://arxiv.org/abs/2303.02700· Code: NoneDecompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization· Paper: https://arxiv.org/abs/2303.02328· Code: NoneGradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization· Paper: https://arxiv.org/abs/2303.03108· Code: NoneUnlearnable Clusters: Towards Label-agnostic Unlearnable Examples· Paper: https://arxiv.org/abs/2301.01217· Code: https://github.com/jiamingzhang94/Unlearnable-ClustersWhere We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes· Paper: https://arxiv.org/abs/2303.04249· Code: NoneUniHCP: A Unified Model for Human-Centric Perceptions· Paper: https://arxiv.org/abs/2303.02936· Code: https://github.com/OpenGVLab/UniHCPCUDA: Convolution-based Unlearnable Datasets· Paper: https://arxiv.org/abs/2303.04278· Code: https://github.com/vinusankars/Convolution-based-UnlearnabilityMasked Images Are Counterfactual Samples for Robust Fine-tuning· Paper: https://arxiv.org/abs/2303.03052· Code: NoneAdaptiveMix: Robust Feature Representation via Shrinking Feature Space· Paper: https://arxiv.org/abs/2303.01559· Code: https://github.com/WentianZhang-ML/AdaptiveMixPhysical-World Optical Adversarial Attacks on 3D Face Recognition· Paper: https://arxiv.org/abs/2205.13412· Code: https://github.com/PolyLiYJ/SLAttack.gitDPE: Disentanglement of Pose and Expression for General Video Portrait Editing· Paper: https://arxiv.org/abs/2301.06281· Code: https://carlyx.github.io/DPE/SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation· Paper: https://arxiv.org/abs/2211.12194· Code: https://github.com/Winfredy/SadTalkerIntrinsic Physical Concepts Discovery with Object-Centric Predictive Models· Paper: None· Code: NoneSharpness-Aware Gradient Matching for Domain Generalization· Paper: None· Code: https://github.com/Wang-pengfei/SAGMMind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization· Paper: None· Code: NoneBlind Video Deflickering by Neural Filtering with a Flawed Atlas· Homepage: https://chenyanglei.github.io/deflicker· Paper: None· Code: NoneRiDDLE: Reversible and Diversified De-identification with Latent Encryptor· Paper: None· Code: https://github.com/ldz666666/RiDDLEPoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation· Paper: https://arxiv.org/abs/2303.07337· Code: NoneUpcycling Models under Domain and Category Shift· Paper: https://arxiv.org/abs/2303.07110· Code: https://github.com/ispc-lab/GLCModality-Agnostic Debiasing for Single Domain Generalization· Paper: https://arxiv.org/abs/2303.07123· Code: NoneProgressive Open Space Expansion for Open-Set Model Attribution· Paper: https://arxiv.org/abs/2303.06877· Code: NoneDynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies· Paper: https://arxiv.org/abs/2303.06856· Code: NoneGFPose: Learning 3D Human Pose Prior with Gradient Fields· Paper: https://arxiv.org/abs/2212.08641· Code: https://github.com/Embracing/GFPosePRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment· Paper: https://arxiv.org/abs/2303.11526· Code: https://github.com/Zhang-VISLabSketch2Saliency: Learning to Detect Salient Objects from Human Drawings· Paper: https://arxiv.org/abs/2303.11502· Code: NoneBoundary Unlearning· Paper: https://arxiv.org/abs/2303.11570· Code: NoneImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing· Paper: https://arxiv.org/abs/2303.17096· Code: https://github.com/alibaba/easyrobustZero-shot Model Diagnosis· Paper: https://arxiv.org/abs/2303.15441· Code: NoneGeoNet: Benchmarking Unsupervised Adaptation across Geographies· Homepage: https://tarun005.github.io/GeoNet/· Paper: https://arxiv.org/abs/2303.15443Quantum Multi-Model Fitting· Paper: https://arxiv.org/abs/2303.15444· Code: https://github.com/FarinaMatteo/qmmfDivClust: Controlling Diversity in Deep Clustering· Paper: https://arxiv.org/abs/2304.01042· Code: NoneNeural Volumetric Memory for Visual Locomotion Control· Homepage: https://rchalyang.github.io/NVM· Paper: https://arxiv.org/abs/2304.01201· Code: https://rchalyang.github.io/NVMMonoHuman: Animatable Human Neural Field from Monocular Video· Homepage: https://yzmblog.github.io/projects/MonoHuman/· Paper: https://arxiv.org/abs/2304.02001· Code: https://github.com/Yzmblog/MonoHumanTrace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion· Homepage: https://nv-tlabs.github.io/trace-pace/· Paper: https://arxiv.org/abs/2304.01893· Code: NoneBridging the Gap between Model Explanations in Partially Annotated Multi-label Classification· Paper: https://arxiv.org/abs/2304.01804· Code: NoneHyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering· Paper: https://arxiv.org/abs/2304.01686· Code: NoneOn the Stability-Plasticity Dilemma of Class-Incremental Learning· Paper: https://arxiv.org/abs/2304.01663· Code: NoneDefending Against Patch-based Backdoor Attacks on Self-Supervised Learning· Paper: https://arxiv.org/abs/2304.01482· Code: NoneVNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution· Paper: https://arxiv.org/abs/2304.01434· Code: https://github.com/jaeill/CVPR23-VNEDetecting and Grounding Multi-Modal Media Manipulation· Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake· Paper: https://arxiv.org/abs/2304.02556· Code: https://github.com/rshaojimmy/MultiModal-DeepFakeMeta-causal Learning for Single Domain Generalization· Paper: https://arxiv.org/abs/2304.03709· Code: NoneDisentangling Writer and Character Styles for Handwriting Generation· Paper: https://arxiv.org/abs/2303.14736· Code: https://github.com/dailenson/SDT

Sdvillzew 发表于 2023-5-19 16:35:51

2023 CVPR将于2023年6 月 18 日星期日至 22 日星期四,在温哥华会议中心举行,请您登录:https://cvpr.thecvf.com/ ,关注该会议的最新消息,以便及时获取相关内容。

——注:以上为OpenAI ChatGPT自动分析结果,仅供参考
页: [1]
查看完整版本: CVPR 2023 论文和开源项目合集(Papers with Code)