Episodios

  • arxiv preprint - Learning Task Decomposition to Assist Humans in Competitive Programming
    Aug 16 2024

    In this episode, we discuss Learning Task Decomposition to Assist Humans in Competitive Programming by Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang. The paper presents a method to enhance human understanding and repair of language model (LM)-generated solutions by automatically breaking down complex solutions into simpler subtasks. They introduce a novel objective called assistive value (AssistV) to measure how easily humans can repair these subtasks and validate their method through a dataset of human repair experiences. The approach significantly improves the problem-solving ability and speed of non-experts in competitive programming, allowing them to solve more problems and match the performance of unassisted experts.

    Más Menos
    6 m
  • arxiv preprint - IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
    Aug 13 2024

    In this episode, we discuss IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts by Ciara Rowles, Shimon Vainer, Dante De Nigris, Slava Elizarov, Konstantin Kutsy, Simon Donné. The paper discusses IPAdapter-Instruct, a method combining natural-image conditioning with "Instruct" prompts to enable nuanced control over image generation. This approach allows for multiple interpretations (like style transfer or object extraction) of the same conditioning image, addressing limitations of current models that require multiple adapters for different tasks. IPAdapter-Instruct effectively learns various tasks with minimal quality loss, enhancing practical usability in workflows requiring diverse outputs.

    Más Menos
    5 m
  • arxiv preprint - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
    Aug 10 2024

    In this episode, we discuss Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters by Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar. The paper explores the impact of increased inference-time computation on Large Language Models (LLMs) to enhance their performance on challenging prompts. It examines two primary methods for scaling test-time computation and finds that their effectiveness varies with the prompt's difficulty, advocating for an adaptive “compute-optimal” strategy. This approach significantly improves test-time compute efficiency and can enable smaller models to outperform much larger ones under computationally equivalent conditions.

    Más Menos
    5 m
  • arxiv preprint - Language Model Can Listen While Speaking
    Aug 9 2024

    In this episode, we discuss Language Model Can Listen While Speaking by Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen. The paper explores enhancing real-time interaction in speech-based conversational AI by introducing listening-while-speaking language models (LSLM) for full duplex communication. LSLM integrates simultaneous listening and speaking capabilities using a token-based decoder-only TTS and a streaming SSL encoder. Experimental results show LSLM's robustness and sensitivity to diverse instructions, advocating its potential to improve interactive speech dialogue systems in real-world applications.

    Más Menos
    4 m
  • arxiv preprint - Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
    Aug 7 2024

    In this episode, we discuss Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning by Trapoom Ukarapol, Zhicheng Lee, Amy Xin. The paper investigates enhancing smaller language models, like MiniCPM, through improved text embeddings via contrastive fine-tuning on the NLI dataset. Results indicate that this fine-tuning significantly improves performance across multiple benchmarks, with MiniCPM showing a notable 56.33% performance gain. The study's code is available at https://github.com/trapoom555/Language-Model-STS-CFT.

    Más Menos
    5 m
  • arxiv preprint - Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
    Aug 6 2024

    In this episode, we discuss Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle by Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan. Recent 3D large reconstruction models often generate low-quality and inconsistent multi-view images, which harm the final 3D output. To resolve this, the proposed Cycle3D framework integrates a 2D diffusion-based generation module and a 3D reconstruction module to iteratively enhance texture quality and multi-view consistency. Experiments show that Cycle3D outperforms state-of-the-art methods in creating high-quality and consistent 3D content.

    Más Menos
    4 m
  • arxiv preprint - Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
    Aug 6 2024

    In this episode, we discuss Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent by Shanbo Cheng, Zhichao Huang, Tom Ko, Hang Li, Ningxin Peng, Lu Xu, Qini Zhang. The paper introduces CLASI, a high-quality and human-like Simultaneous Speech Translation (SiST) system inspired by professional interpreters' strategies to balance translation quality and latency. Utilizing a multi-modal retrieving module and Large Language Models (LLMs), CLASI significantly outperforms other systems, especially in challenging real-world scenarios. Evaluated using the valid information proportion (VIP) metric, CLASI achieves impressive results compared to state-of-the-art systems, with VIP scores of 81.3% for Chinese-to-English and 78.0% for English-to-Chinese translations.

    Más Menos
    4 m
  • arxiv preprint - Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
    Jul 31 2024

    In this episode, we discuss Graph-enhanced Large Language Models in Asynchronous Plan Reasoning by Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert. The paper investigates how well large language models (LLMs) like GPT-4 and LLaMA-2 handle reasoning about asynchronous plans and finds that they perform poorly without visual aids. It introduces a new technique, Plan Like a Graph (PLaG), which integrates graphs with language prompts, significantly improving model performance. Despite this improvement, the study highlights the limitations of LLMs when dealing with complex tasks, underscoring the challenges of using them as autonomous agents.

    Más Menos
    4 m