• arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

  • Jun 26 2024
  • Duración: 5 m
  • Podcast

arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities  Por  arte de portada

arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

  • Resumen

  • In this episode, we discuss 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities by Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir. The paper presents a novel any-to-any model that significantly extends the capabilities of existing multimodal and multitask foundation models by training on tens of highly diverse modalities, including images, text, geometric data, and more. Through discrete tokenization of various data types and co-training on large-scale datasets, the model can address three times more tasks/modalities than current models without sacrificing performance. The authors demonstrate this with a three billion parameter model, providing open access to the models and training code.

    Más Menos
activate_primeday_promo_in_buybox_DT

Lo que los oyentes dicen sobre arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.