Episodios

  • Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP
    Oct 8 2024

    In this episode, we chat with JP, creator of FizzBee, about formal methods and their application in software engineering. We explore the differences between coding and engineering, discussing how formal methods can improve system design and reliability. JP shares insights from his time at Google and explains why tools like FizzBee are crucial for distributed systems. We delve into the challenges of adopting formal methods in industry, the potential of FizzBee to make these techniques more accessible, and how it compares to other tools like TLA+. Finally, we discuss the future of software development, including the role of LLMs in code generation and the ongoing importance of human engineers in system design.

    Links
    FizzBee
    FizzBee Github Repo
    FizzBee Blog

    Chapters
    00:00 Introduction and Overview
    02:42 JP's Experience at Google and the Growth of the Company
    04:51 The Difference Between Engineers and Coders
    06:41 The Importance of Rigor and Quality in Engineering
    10:08 The Limitations of QA and the Need for Formal Methods
    14:00 The Role of Best Practices in Software Engineering
    14:56 Design Specification Languages for System Correctness
    21:43 The Applicability of Formal Methods in Distributed Systems
    31:20 Getting Started with FizzBee: A Practical Example
    36:06 Common Assumptions and Misconceptions in Distributed Systems
    43:23 The Role of FizzBee in the Design Phase
    48:04 The Future of FizzBee: LLMs and Code Generation
    58:20 Getting Started with FizzBee: Tutorials and Online Playground


    Click here to view the episode transcript.

    Más Menos
    1 h y 1 m
  • MLOps Evolution: Data, Experiments, and AI with Dean Pleban from DagsHub
    Sep 27 2024

    In this episode, we chat with Dean Pleban, CEO of DagsHub, about machine learning operations. We explore the differences between DevOps and MLOps, focusing on data management and experiment tracking. Dean shares insights on versioning various components in ML projects and discusses the importance of user experience in MLOps tools. We also touch on DagsHub's integration of AI in their product and Dean's vision for the future of AI and machine learning in industry.

    Links

    DagsHub
    The MLOps Podcast
    Dean on LI

    Chapters

    00:00 Introduction and Background
    03:03 Challenges of Managing Machine Learning Projects
    10:00 The Concept of Experiments in Machine Learning
    12:51 Data Curation and Validation for High-Quality Data
    27:07 Connecting the Components of Machine Learning Projects with DAGS Hub
    29:12 The Importance of Data and Clear Interfaces
    43:29 Incorporating Machine Learning into DAGsHub
    51:27 The Future of ML and AI

    Más Menos
    54 m
  • How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion
    Sep 13 2024

    In this episode, Kostas and Nitay are joined by Amey Chaugule and Matt Green, co-founders of Denormalized. They delve into how Denormalized is building an embedded stream processing engine—think “DuckDB for streaming”—to simplify real-time data workloads. Drawing from their extensive backgrounds at companies like Uber, Lyft, Stripe, and Coinbase. Amey and Matt discuss the challenges of existing stream processing systems like Spark, Flink, and Kafka. They explain how their approach leverages Apache DataFusion, to create a single-node solution that reduces the complexities inherent in distributed systems.


    The conversation explores topics such as developer experience, fault tolerance, state management, and the future of stream processing interfaces. Whether you’re a data engineer, application developer, or simply interested in the evolution of real-time data infrastructure, this episode offers valuable insights into making stream processing more accessible and efficient.


    Contacts & Links
    Amey Chaugule
    Matt Green
    Denormalized
    Denormalized Github Repo

    Chapters
    00:00 Introduction and Background
    12:03 Building an Embedded Stream Processing Engine
    18:39 The Need for Stream Processing in the Current Landscape
    22:45 Interfaces for Interacting with Stream Processing Systems
    26:58 The Target Persona for Stream Processing Systems
    31:23 Simplifying Stream Processing Workloads and State Management
    34:50 State and Buffer Management
    37:03 Distributed Computing vs. Single-Node Systems
    42:28 Cost Savings with Single-Node Systems
    47:04 The Power and Extensibility of Data Fusion
    55:26 Integrating Data Store with Data Fusion
    57:02 The Future of Streaming Systems
    01:00:18 intro-outro-fade.mp3

    Click here to view the episode transcript.


    Más Menos
    1 h y 2 m
  • Unifying structured and unstructured data for AI: Rethinking ML infrastructure with Nikhil Simha and Varant Zanoyan
    Aug 30 2024

    In this episode, we dive deep into the future of data infrastructure for AI and ML with Nikhil Simha and Varant Zanoyan, two seasoned engineers from Airbnb and Facebook. Nikhil and Varant share their journey from building real-time data systems and ML infrastructure at tech giants to launching their own venture.

    The conversation explores the intricacies of designing developer-friendly APIs, the complexities of handling both batch and streaming data, and the delicate balance between customer needs and product vision in a startup environment.

    Contacts & Links

    Nikhil Simha
    Varant Zanoyan
    Chronon project

    Chapters

    00:00 Introduction and Past Experiences
    04:38 The Challenges of Building Data Infrastructure for Machine Learning
    08:01 Merging Real-Time Data Processing with Machine Learning
    14:08 Backfilling New Features in Data Infrastructure
    20:57 Defining Failure in Data Infrastructure
    26:45 The Choice Between SQL and Data Frame APIs
    34:31 The Vision for Future Improvements
    38:17 Introduction to Chrono and Open Source
    43:29 The Future of Chrono: New Computation Paradigms
    48:38 Balancing Customer Needs and Vision
    57:21 Engaging with Customers and the Open Source Community
    01:01:26 Potential Use Cases and Future Directions

    Click here to view the episode transcript.

    Más Menos
    1 h y 2 m
  • Stream processing, LSMs and leaky abstractions with Chris Riccomini
    Aug 23 2024

    In this episode, we chat with Chris Riccomini about the evolution of stream processing and the challenges in building applications on streaming systems. We also chat about leaky abstractions, good and bad API designs, what Chris loves and hates about Rust and finally about his exciting new project that involves object storage and LSMs.

    Connect with Chris at:
    LinkedIn
    X
    Blog
    Materialized View Newsletter - His newsletter
    The missing README - His book
    SlateDB - His latest OSS Project

    Chapters
    00:00 Introduction and Background

    04:05 The State of Stream Processing Today

    08:53 The Limitations of SQL in Streaming Systems

    14:00 Prioritizing the Developer Experience in Stream Processing

    18:15 Improving the Usability of Streaming Systems

    27:54 The Potential of State Machine Programming in Complex Systems

    32:41 The Power of Rust: Compiling and Language Bindings

    34:06 The Shift from Sidecar to Embedded Libraries Driven by Rust

    35:49 Building an LSM on Object Storage: Cost-Effective State Management

    39:47 The Unbundling and Composable Nature of Databases

    47:30 The Future of Data Systems: More Companies and Focus on Metadata


    Click here to view the episode transcript.

    Más Menos
    53 m