Episodes

  • Data lineage and AI: Ensuring quality and compliance with Matt Barlin
    Jul 3 2024

    Ready to uncover the secrets of modern systems engineering and the future of AI? Join us for an enlightening conversation with Matt Barlin, the Chief Science Officer of Valence. Matt's extensive background in systems engineering and data lineage sets the stage for a fascinating discussion. He sheds light on the historical evolution of the field, the critical role of documentation, and the early detection of defects in complex systems. This episode promises to expand your understanding of model-based systems and data issues, offering valuable insights that only an expert of Matt's caliber can provide.

    In the heart of our episode, we dive into the fundamentals and transformative benefits of data lineage in AI. Matt draws intriguing parallels between data lineage and the engineering life cycle, stressing the importance of tracking data origins, access rights, and verification processes. Discover how decentralized identifiers are paving the way for individuals to control and monetize their own data. With the phasing out of third-party cookies and the challenges of human-generated training data shortages, we explore how systems like retrieval-augmented generation (RAG) and compliance regulations like the EU AI Act are shaping the landscape of AI data quality and compliance.

    Don’t miss this thought-provoking episode that promises to keep you at the forefront of responsible AI.

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    28 mins
  • Differential privacy: Balancing data privacy and utility in AI
    Jun 4 2024

    Explore the basics of differential privacy and its critical role in protecting individual anonymity. The hosts explain the latest guidelines and best practices in applying differential privacy to data for models such as AI. Learn how this method also makes sure that personal data remains confidential, even when datasets are analyzed or hacked.

    Show Notes

    • Intro and AI news (00:00)
      • Google AI search tells users to glue pizza and eat rocks
      • Gary Marcus on break? (Maybe and X only break)
    • What is differential privacy? (06:34)
      • Differential privacy is a process for sensitive data anonymization that offers each individual in a dataset the same privacy they would experience if they were removed from the dataset entirely.
      • NIST’s recent paper SP 800-226 IPD: “Any privacy harms that result form a differentially private analysis could have happened if you had not contributed your data”.
      • There are two main types of differential privacy: global (NIST calls it Central) and local
    • Why should people care about differential privacy? (11:30)
      • Interest has been increasing for organizations to intentionally and systematically prioritize the privacy and safety of user data
      • Speed up deployments of AI systems for enterprise customers since connections to raw data do not need to be established
      • Increase data security for customers that utilize sensitive data in their modeling systems
      • Minimize the risk of sensitive data exposure for your data privileges - i.e. Don’t be THAT organization
    • Guidelines and resources for applied differential privacy
      • Guidelines for Evaluating Differential Privacy Guarantees:
      • NIST De-Identification
    • Practical examples of applied differential privacy (15:58)
      • Continuous Features - cite: Dwork, McSherry, Nissim, and Smith’s 2006 seminal paper "Calibrating Noise to Sensitivity in Private Data Analysis”[2], introduces a concept called ε-differential privacy
      • Categorical Features - cite: Warner (1965) created a randomized response technique in his paper titled: “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias”
    • Summary and key takeaways (23:59)
      • Differential privacy is going to be a part of how many of us need to manage data privacy
      • Data providers can’t provide us with anonymized data for analysis or when anonymization isn’t enough for our privacy needs
      • Hopeful that cohort targeting takes over for individual targeting
      • Remember: Differential privacy does not prevent bias!


    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    28 mins
  • Responsible AI: Does it help or hurt innovation? With Anthony Habayeb
    May 7 2024

    Artificial Intelligence (AI) stands at a unique intersection of technology, ethics, and regulation. The complexities of responsible AI are brought into sharp focus in this episode featuring Anthony Habayeb, CEO and co-founder of Monitaur, As responsible AI is scrutinized for its role in profitability and innovation, Anthony and our hosts discuss the imperatives of safe and unbiased modeling systems, the role of regulations, and the importance of ethics in shaping AI.

    Show notes

    Prologue: Why responsible AI? Why now? (00:00:00)

    • Deviating from our normal topics about modeling best practices
    • Context about where regulation plays a role in industries besides big tech
    • Can we learn from other industries about the role of "responsibility" in products?

    Special guest, Anthony Habayeb (00:02:59)

    • Introductions and start of the discussion
    • Of all the companies you could build around AI, why governance?

    Is responsible AI the right phrase? (00:11:20)

    • Should we even call good modeling and business practices "responsible AI"?
    • Is having responsible AI a “want to have?” or a “need to have?”

    Importance of AI regulation and responsibility (00:14:49)

    • People in the AI and regulation worlds have started pushing back on Responsible AI.
    • Do regulations impede freedom?
    • Discussing the big picture of responsibility and governance: Explainability, repeatability, records, and audit

    What about bias and fairness? (00:22:40)

    • You can have fair models that operate with bias
    • Bias in practice identifies inequities that models have learned
    • Fairness is correcting for societal biases to level the playing field for safer business and modeling practices to prevail.

    Responsible deployment and business management (00:35:10)

    • Discussion about what organizations get right about responsible AI
    • And what organizations can get completely wrong if they aren't careful.

    Embracing responsible AI practices (00:41:15)

    • Getting your teams, companies, and individuals involved in the movement towards building AI responsibly

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    46 mins
  • Baseline modeling and its critical role in AI and business performance
    Apr 17 2024

    Baseline modeling is a necessary part of model validation. In our expert opinion, it should be required before model deployment. There are many baseline modeling types and in this episode, we're discussing their use cases, strengths, and weaknesses. We're sure you'll appreciate a fresh take on how to improve your modeling practices.

    Show notes

    Introductions and news: why reporting and visibility is a good thing for AI 0:03

    • Spoiler alert: Providing visibility to AI bias audits does NOT mean exposing trade secrets. Some reports claim otherwise.
    • Discussion about AI regulation in the context of current events and how regulation is playing out between Boeing and the FAA (tbc)

    Understanding baseline modeling for machine learning 7:41

    • Establishing baselines allows us to understand how models perform relative to simple rules-based models, aka heuristics.
    • Reporting results without baselines to compare against is like giving a movie a rating of 5 without telling the listener that you were using a 10-point scale.
    • Baseline modeling comparisons are part of rigorous model validations and should always be conducted during early model development and final production deployment.
    • Pairs with analyses of theoretical upper bounds for modeling performance to show how your technique scores between acceptable worst and best case performance.
    • We often find complex models being deployed in the real world that haven’t proven their value over simpler and explainable baseline models

    Classification baselines and model performance comparison 19:40

    • Uniform Random Selection - simulate how your model does against a baseline model that guesses classes randomly like a dice.
    • Most Frequent Class (MFC) - the most telling test and often the most telling test in the case of highly skewed data with inappropriate metrics.
    • Single-feature modeling - Validates how much the complex signal from your data and model improves over a bare minimum explainable model.
    • And more…

    Exploring regression and more advanced baselines for modeling 24:11

    • Regression baselines: mean, median mode, Single-variable linear regression, Lag 1, and Least 5% re-interpretation
    • Advanced baselines in language and vision

    Conclusions 35:39

    • Baseline modeling is a necessary part of model validation
    • There are differing flavors of baselines that are appropriate for all types of modeling
    • Baselines are needed to establish fair and realistic lower bounds for performance
    • If your model can’t perform significantly better than a baseline consider scrapping the model and trying a new approach

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    36 mins
  • Information theory and the complexities of AI model monitoring
    Mar 26 2024

    In this episode, we explore information theory and the not-so-obvious shortcomings of its popular metrics for model monitoring; and where non-parametric statistical methods can serve as the better option.

    Introduction and latest news 0:03

    • Gary Marcus has written an article questioning the hype around generative AI, suggesting it may not be as transformative as previously thought.
    • This in contrast to announcements out of the NVIDIA conference during the same week.

    Information theory and its applications in AI. 3:45

    • The importance of information theory in computer science, citing its applications in cryptography and communication.
    • The basics of information theory, including the concept of entropy, which measures the uncertainty of a random variable.
    • Information theory as a fundamental discipline in computer science, and how it has been applied in recent years, particularly in the field of machine learning.
    • The speakers clarify the difference between a metric and a divergence, which is crucial to understanding how information theory is being misapplied in some cases

    Information theory metrics and their limitations. 7:05

    • Divergences are a type of measurement that don't follow simple rules like distance, and they have some nice properties but can be troublesome in certain use cases.
    • KL Divergence is a popular test for monitoring changes in data distributions, but it's not symmetric and can lead to incorrect comparisons.
    • Sid explains that KL divergence measures the slight surprisal or entropy difference between moving from one data distribution to another, and is not the same as KS test.

    Metrics for monitoring AI model changes. 10:41

    • The limitations of KL divergence and its alternatives, including Jenson Shannon divergence and population stability index.
    • They highlight the issues with KL divergence, such as asymmetry and handling of zeros, and the advantages of Jenson Shannon divergence, which can handle both issues, and population stability index, which provides a quantitative measure of changes in model distributions.
    • The popularity of information theory metrics in AI and ML is largely due to legacy and a lack of understanding of the underlying concepts.
    • Information theory metrics may not be the best choice for quantifying change in risk in the AI and ML space, but they are the ones that are commonly used due to familiarity and ease of use.

    Using nonparametric statistics in modeling systems. 15:09

    • Information theory divergences are not useful for monitoring production model performance, according to the speakers.
    • Andrew Clark highlights the advantages of using nonparametric statistics in machine learning, including distribution agnosticism and the ability to test for significance without knowing the underlying distribution.
    • Sid Mangali

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    22 mins
  • The importance of anomaly detection in AI
    Mar 6 2024

    In this episode, the hosts focus on the basics of anomaly detection in machine learning and AI systems, including its importance, and how it is implemented. They also touch on the topic of large language models, the (in)accuracy of data scraping, and the importance of high-quality data when employing various detection methods. You'll even gain some techniques you can use right away to improve your training data and your models.

    Intro and discussion (0:03)

    • Questions about Information Theory from our non-parametric statistics episode.
    • Google CEO calls out chatbots (WSJ)
    • A statement about anomaly detection as it was regarded in 2020 (Forbes)
    • In the year 2024, are we using AI to detect anomalies, or are we detecting anomalies in AI? Both?

    Understanding anomalies and outliers in data (6:34)

    • Anomalies or outliers are data that are so unexpected that their inclusion raises warning flags about inauthentic or misrepresented data collection.
    • The detection of these anomalies is present in many fields of study but canonically in: finance, sales, networking, security, machine learning, and systems monitoring
    • A well-controlled modeling system should have few outliers
    • Where anomalies come from, including data entry mistakes, data scraping errors, and adversarial agents
    • Biggest dinosaur example: https://fivethirtyeight.com/features/the-biggest-dinosaur-in-history-may-never-have-existed/

    Detecting outliers in data analysis (15:02)

    • High-quality, highly curated data is crucial for effective anomaly detection.
    • Domain expertise plays a significant role in anomaly detection, particularly in determining what makes up an anomaly.

    Anomaly detection methods (19:57)

    • Discussion and examples of various methods used for anomaly detection
      • Supervised methods
      • Unsupervised methods
      • Semi-supervised methods
      • Statistical methods

    Anomaly detection challenges and limitations (23:24)

    • Anomaly detection is a complex process that requires careful consideration of various factors, including the distribution of the data, the context in which the data is used, and the potential for errors in data entry
    • Perhaps we're detecting anomalies in human research design, not AI itself?
    • A simple first step to anomaly detection is to visually plot numerical fields. "Just look at your data, don't take it at face value and really examine if it does what you think it does and it has what you think it has in it." This basic practice, devoid of any complex AI methods, can be an effective starting point in identifying potential anomalies.

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    36 mins
  • What is consciousness, and does AI have it?
    Feb 13 2024

    We're taking a slight detour from modeling best practices to explore questions about AI and consciousness.

    With special guest Michael Herman, co-founder of Monitaur and TestDriven.io, the team discusses different philosophical perspectives on consciousness and how these apply to AI. They also discuss the potential dangers of AI in its current state and why starting fresh instead of iterating can make all the difference in achieving characteristics of AI that might resemble consciousness.

    Show notes

    Why consciousness for this episode?

    • Enough listeners have randomly asked the hosts if Skynet is on the horizon
    • Does modern or future AI have the wherewithal to take over the world, and is it even conscious or intelligent?
    • Do we even have a good definition of consciousness?

    Introducing Michael Herman as guest speaker

    • Co-founder of Monitaur, Engineer extraordinaire, and creator of TestDriven.io, a training company that focuses on educating and upskilling mid-level senior-level web developers.
    • Degree and studies in philosophy and technology

    Establishing the philosophical foundation of consciousness

    • Consciousness is around us everywhere. It can mean different things to different people.
    • Most discussion about the subject bypasses the Mind-Body Problem and a few key theories:
      • Dualism - the mind and body are distinct
      • Materialism - matter is king and consciousness arises in complex material systems
      • Panpsychism - consciousness is king. It underlies everything at the quantum level

    The potential dangers of achieving consciousness in AI

    • While there is potential for AI to reach consciousness, we're far from that point.
    • Dangers are more related to manipulation and misinformation, rather than the risk of conscious machines turning against humanity.

    The need for a new approach to developing AI systems

    • There's a need to start from scratch if the goal is to achieve consciousness in AI systems.
    • Current modeling techniques might not lead to AI achieving consciousness. A new paradigm might be required.
    • There's a need to define what consciousness in AI means and to develop a test for it.

    Final thoughts and wrap-up

    • If consciousness is truly the goal, the case for starting from scratch allows for fairness and ethics to be established foundationally
    • AI systems should be built with human values in mind

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    33 mins
  • Upskilling for AI: Roles, organizations, and new mindsets
    Jan 25 2024

    Data scientists, researchers, engineers, marketers, and risk leaders find themselves at a crossroads to expand their skills or risk obsolescence. The hosts discuss how a growth mindset and "the fundamentals" of AI can help.

    Our episode shines a light on this vital shift, equipping listeners with strategies to elevate their skills and integrate multidisciplinary knowledge. We share stories from the trenches on how each role affects robust AI solutions that adhere to ethical standards, and how embracing a T-shaped model of expertise can empower data scientists to lead the charge in industry-specific innovations.

    Zooming out to the executive suite, we dissect the complex dance of aligning AI innovation with core business strategies. Business leaders take note as we debunk the myth of AI as a panacea and advocate for a measured, customer-centric approach to technology adoption. We emphasize the decisive role executives play in steering their companies through the AI terrain, ensuring that every technological choice propels the business forward, overcoming the ephemeral allure of AI trends.

    Suggested courses, public offerings:

    • Undergrad level Stanford course (Coursera): Machine Learning Specialization
    • Graduate-level MIT Open Courseware: Machine Learning

    We hope you enjoy this candid conversation that could reshape your outlook on the future of AI and the roles and responsibilities that support it.

    Resources mentioned in this episode

    • LinkedIn's jobs on the rise 2024
    • 3 questions to separate AI from marketing hype
    • Disruption or distortion? The impact of AI on future operating models
    • The Obstacle is the Way by Ryan Holiday

    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Show more Show less
    41 mins