TechcraftingAI Computer Vision

Episodios

Ep. 247 - Part 3 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: LRM-Zero: Training Large Reconstruction Models with Synthesized Data

01:56: Scale-Invariant Monocular Depth Estimation via SSI Depth

03:08: GGHead: Fast and Generalizable 3D Gaussian Heads

04:55: Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

06:34: Towards Vision-Language Geo-Foundation Model: A Survey

08:11: SimGen: Simulator-conditioned Driving Scene Generation

09:44: Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

11:03: Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

12:32: LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

13:56: WonderWorld: Interactive 3D Scene Generation from a Single Image

15:21: Modeling Ambient Scene Dynamics for Free-view Synthesis

16:29: Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

17:50: Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

19:39: Real-Time Deepfake Detection in the Real-World

21:17: OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

23:02: Yo'LLaVA: Your Personalized Language and Vision Assistant

24:30: MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

26:26: Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

28:03: Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

29:59: ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

31:24: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

33:16: Towards Evaluating the Robustness of Visual State Space Models

34:57: Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

36:09: CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

37:37: Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

40:02: MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

41:40: Explore the Limits of Omni-modal Pretraining at Scale

42:46: Interpreting the Weight Space of Customized Diffusion Models

43:58: Depth Anything V2

45:12: An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

46:23: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

48:11: Rethinking Score Distillation as a Bridge Between Image Distributions

49:44: VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Más Menos

52 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 247 - Part 2 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

06:41: Auto-Vocabulary Segmentation for LiDAR Points

07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

10:23: Fine-Grained Domain Generalization with Feature Structuralization

12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

14:13: ReMI: A Dataset for Reasoning with Multiple Images

15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

29:28: Comparison Visual Instruction Tuning

30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

37:30: Parameter-Efficient Active Learning for Foundational models

38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

Más Menos

53 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 247 - Part 1 - June 13, 2024

Jun 15 2024

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: FouRA: Fourier Low Rank Adaptation

01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

06:46: ToSA: Token Selective Attention for Efficient Vision Transformers

08:00: Computer vision-based model for detecting turning lane features on Florida's public roadways

09:08: Improving Adversarial Robustness via Feature Pattern Consistency Constraint

10:52: Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

12:10: NeRF Director: Revisiting View Selection in Neural Volume Rendering

13:36: Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

15:03: Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

16:40: COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

18:16: Fusion of regional and sparse attention in Vision Transformers

19:26: Zoom and Shift are All You Need

20:17: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

21:49: The Penalized Inverse Probability Measure for Conformal Classification

23:24: OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

24:47: Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

26:30: Computer Vision Approaches for Automated Bee Counting Application

27:17: Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

28:16: A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

29:43: Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

31:25: Neural NeRF Compression

32:29: Preserving Identity with Variational Score for General-purpose 3D Editing

33:50: AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

34:51: Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

36:10: Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

37:34: AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring

38:49: Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

40:45: A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

42:02: Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

43:28: FacEnhance: Facial Expression Enhancing with Recurrent DDPMs

45:11: How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models

47:08: Suitability of KANs for Computer Vision: A preliminary investigation

Más Menos

48 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 246 - Part 3 - June 12, 2024

Jun 13 2024

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

19:58: Coherent Optical Modems for Full-Wavefield Lidar

21:32: Transformation-Dependent Adversarial Attacks

22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

33:12: What If We Recaption Billions of Web Images with LLaMA-3?

34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

Más Menos

44 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 246 - Part 2 - June 12, 2024

Jun 13 2024

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:21: From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

01:44: Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

03:20: Adversarial Patch for 3D Local Feature Extractor

04:00: Valeo4Cast: A Modular Approach to End-to-End Forecasting

05:38: The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

08:50: Universal Scale Laws for Colors and Patterns in Imagery

10:11: CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

11:44: ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

13:25: Continuous fake media detection: adapting deepfake detectors to new generative techniques

15:18: Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment

16:23: One-Step Effective Diffusion Network for Real-World Image Super-Resolution

18:12: 2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

19:22: Diffusion-Promoted HDR Video Reconstruction

21:09: Runtime Freezing: Dynamic Class Loss for Multi-Organ 3D Segmentation

21:52: A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion

23:54: DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

25:28: Using Deep Convolutional Neural Networks to Detect Rendered Glitches in Video Games

26:39: OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

27:23: Dataset Enhancement with Instance-Level Augmentations

28:33: Interpretable Representation Learning of Cardiac MRI via Attribute Regularization

29:33: A New Class Biorthogonal Spline Wavelet for Image Edge Detection

30:48: Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

32:10: Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance

33:32: AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer

35:09: From Chaos to Clarity: 3DGS in the Dark

36:32: LaMOT: Language-Guided Multi-Object Tracking

38:07: UDON: Universal Dynamic Online distillatioN for generic image representations

39:49: WMAdapter: Adding WaterMark Control to Latent Diffusion Models

40:48: Blind Image Deblurring using FFT-ReLU with Deep Learning Pipeline Integration

42:06: DocSynthv2: A Practical Autoregressive Modeling for Document Generation

Más Menos

43 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 246 - Part 1 - June 12, 2024

Jun 13 2024

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

07:00: Small Scale Data-Free Knowledge Distillation

08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

31:49: LVBench: An Extreme Long Video Understanding Benchmark

33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

37:29: MWIRSTD: A MWIR Small Target Detection Dataset

38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

44:26: Identification of Conversation Partners from Egocentric Video

Más Menos

46 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 245 - Part 3 - June 11, 2024

Jun 13 2024

ArXiv Computer Vision research for Tuesday, June 11, 2024.

00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

08:58: Image Neural Field Diffusion Models

10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

14:26: ReduceFormer: Attention with Tensor Reduction by Summation

15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

16:44: SPIN: Spacecraft Imagery for Navigation

18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

20:00: Understanding Visual Concepts Across Models

21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

22:47: Neural Gaffer: Relighting Any Object via Diffusion

24:19: Autoregressive Pretraining with Mamba in Vision

25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

28:50: Situational Awareness Matters in 3D Vision Language Reasoning

30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

31:46: Zero-shot Image Editing with Reference Imitation

33:08: Image and Video Tokenization with Binary Spherical Quantization

34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Más Menos

38 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Ep. 245 - Part 2 - June 11, 2024

Jun 13 2024

ArXiv Computer Vision research for Tuesday, June 11, 2024.

00:21: NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

01:27: Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph

03:14: T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

04:45: Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

06:23: FaceGPT: Self-supervised Learning to Chat about 3D Human Faces

07:52: RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

09:15: VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation

10:51: RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection

12:05: RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

13:52: MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD

15:15: Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation

16:56: MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

18:20: Open-World Human-Object Interaction Detection via Multi-modal Prompts

20:03: Which Country Is This? Automatic Country Ranking of Street View Photos

20:44: Needle In A Multimodal Haystack

22:10: Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models

23:24: Towards Realistic Data Generation for Real-World Super-Resolution

24:37: Unsupervised Object Detection with Theoretical Guarantees

25:43: Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

27:45: A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation

29:01: Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field

30:24: Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach

32:09: Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

33:52: Deep Implicit Optimization for Robust and Flexible Image Registration

35:28: Visual Representation Learning with Stochastic Frame Prediction

Más Menos

37 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis

Comienza Ahora

Listas Populares

Explora Audible

Episodios

Ep. 247 - Part 3 - June 13, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 247 - Part 2 - June 13, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 247 - Part 1 - June 13, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 246 - Part 3 - June 12, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 246 - Part 2 - June 12, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 246 - Part 1 - June 12, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 245 - Part 3 - June 11, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Ep. 245 - Part 2 - June 11, 2024

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast