Building the Backend: Data Solutions that Power Leading Organizations

Episodios

The Analytics Engine for All Your Data with Justin Borgman @ Starburst

Mar 15 2022
In this episode we speak with Justin Borgman, Chairman & CEO at Starburst, which is based on open source Trino (formerly PrestoSQL) and was recently valued at $3.35 billion after securing their series D funding. In this episode we discuss convergence of DW’s / DL's, why data lakes fail and much much more.
Top 3 takeaways
The data mesh architecture is gaining adoption more quickly in Europe due to GDPR.
There were two main limitations of data lakes when comparing to DW’s, performance and CRUD operations. Performance has been resolved with query engines like Starburst and tools like Apache Iceberg, Apache Hudi and Delta Lake are starting to close the gap with CRUD operations.
The principle of a single source of truth / storing everything in a single DL or DW is not always feasible or possible depending on regulations. Starburst is bridging that gap and enabling data mesh and data fabric architectures.
Más Menos
36 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS

Mar 1 2022
In this episode we speak with Paul Singman Developer Advocate at Treeverse / LakeFS. LakeFS is an open source project that allows you to transform your object storage into a Git-like repository.
Top 3 takeaways
LakeFS enables use cases like debugging to quickly view historical versions of your data at a specific point in time and running ML experiments over the same set of data with branching..
The current data landscape is very fragmented with many tools available.. Over the coming years there will most likely be consolidation of tools that are more open and integrated.
Data quality and observability continue to be key components of successful data lakes and having visibility into job runs.
Más Menos
27 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Enable Faster Data Processing and Access with Apache Arrow with Matt Topol @ Factset

Feb 1 2022
In this episode we speak with Matt Topol, Vice President, Principal Software Architect @ FactSet and dive deep into how they are taking advantage of Apache Arrow for faster processing and data access.
Below are the top 3 value bombs:
Apache Arrow is an open-source in-memory columnar format that creates a standard way to share and process data structures.
Apache Arrow Flight eliminates serialization and deserialization which enables faster access to query results compared to traditional JDBC and ODBC interfaces.
Don’t put all your eggs in one basket, whether you're using commercial products or open source, make sure you design a modular architecture that does not tie you down to any one piece of technology.
Más Menos
49 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Implementing Amundsen @ Convoy with Chad Sanderson

Jan 25 2022
In this episode we speak with Chad Sanderson head of data and early stage startup advisor focused on data innovation @ Convoy and uncover their journey to implementing Amundsen, an open source data catalog.

Below are the top 3 value bombs:
Data Scientist’s should not be spending the majority of their time trying to find the data they are interested in.
Amundsen is a powerful open source data catalog that integrates across your data landscape to provide visibility into your data assets and lineage.
We often get lost in the features within data teams. It’s important to take a step back and understand how you're impacting the bottom line of the business.
Más Menos
36 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
The Importance of Treating Your Data Initiatives as Products with Murali Bhogavalli

Jan 18 2022
Your data team should not just be keeping the lights on, but should be building and creating data products to support the business. In this episode we speak with Murali Bhogavalli a data product manager and explore what is a data product manager and how they differ from a traditional product manager.
Below are the top 3 value bombs:
Data should be looked at as a product and treated as such within the organization (i.e. agile methodologies, continuous improvement…)

Organizations need to be more than just data driven but also data informed. For that to happen, you need to build data literacy into your ecosystem by helping everybody understand what the data means and where is it coming from and the quality of it..

Product managers typically use data to deliver the outcomes. But for a data PM, data is the deliverable and it also the outcome.
Más Menos
27 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Open-Source Data Catalog Amundsen with Mark Grover @ Stemma

Jan 11 2022
In this episode of Building The Backend we hear from Mark Grover founder @ Stemma, co-creator of Amundsen. Stemma is a fully managed data catalog, powered by the leading open-source data catalog, Amundsen.
Below are top 3 value bombs:
Automated data catalogs are critical to help wrangle the growing data across organizations. (i.e. Being able to identify out of 150 columns on this table only 10 are being used downstream)
Tribal knowledge and context cannot be automated - data catalogs cannot be 100% automated.
Amundsen is an open-source data catalog originally created at Lyft. Stemma has created a managed version of Amundsen.
Help me improve the podcast by completing this 60 second survey: https://buildingthebackend.com/survey
Más Menos
41 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Architecting a Modern Data Lake with Dipti Borkar from Ahana

Nov 9 2021
In this episode of Building The Backend we hear from Dipti Borkar cofounder @ Ahana a managed service for Presto on AWS, where we talk all about the data lake, how it should be structured and where the industry is going.

Below are top 3 value bombs:
Presto is an open source distributed SQL query engine originally created by Facebook, mainly used to run SQL queries on data lakes but can be connected to relational data stores as well. Ahana is a managed Presto service on AWS with 3x price/performance.
When optimizing your data lake, it’s normally best to store the data in Parquet or ORC format vs JSON or CSV as they are columnar formats that can have indexes built in.
Data Lake Houses are continuing to gain popularity by bringing the benefits of your data lake and data warehouse together with the help of tools like Databricks DeltaLake and Apache HUDI.
Más Menos
40 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis
Open Source BI with Apache Superset

Nov 2 2021
What tools are you using for data viz? Are they low cost? One option is Apache Superset, in this episode we speak with Robert Stolz to learn more about Superset and other open source data tools.
Top 3 Value Bombs:
One popular use case with Apache Superset is embedding it within applications because it’s open source, there is a wide range of flexibility to integrate it with existing systems.
Apache Superset supports any sources supported by the Python SQL toolkit called SQLAlchemy.
DBT encourages a set of best practices around data development (i.e. source control and test driven development).
Más Menos
29 m

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Escúchala gratis

Comienza Ahora

Listas Populares

Explora Audible

Episodios

The Analytics Engine for All Your Data with Justin Borgman @ Starburst

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Enable Faster Data Processing and Access with Apache Arrow with Matt Topol @ Factset

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Implementing Amundsen @ Convoy with Chad Sanderson

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

The Importance of Treating Your Data Initiatives as Products with Murali Bhogavalli

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Open-Source Data Catalog Amundsen with Mark Grover @ Stemma

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Architecting a Modern Data Lake with Dipti Borkar from Ahana

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast

Open Source BI with Apache Superset

No se pudo agregar al carrito

Add to Cart failed.

Error al Agregar a Lista de Deseos.

Error al eliminar de la lista de deseos.

Error al añadir a tu biblioteca

Error al seguir el podcast

Error al dejar de seguir el podcast