Saturday, June 13, 2026
No Result
View All Result
Crypeto News
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos
CRYPTO MARKETCAP
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos
CRYPTO MARKETCAP
Crypeto News
No Result
View All Result

Unleashing the power of Presto: The Uber case study

by crypetonews
September 25, 2023
in Blockchain
Reading Time: 8 mins read
0 0
A A
0
Home Blockchain
Share on FacebookShare on Twitter


The magic behind Uber’s data-driven success

Uber, the ride-hailing giant, is a household name worldwide. We all recognize it as the platform that connects riders with drivers for hassle-free transportation. But what most people don’t realize is that behind the scenes, Uber is not just a transportation service; it’s a data and analytics powerhouse. Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success.

Uber’s DNA as an analytics company

At its core, Uber’s business model is deceptively simple: connect a customer at point A to their destination at point B. With a few taps on a mobile device, riders request a ride; then, Uber’s algorithms work to match them with the nearest available driver and calculate the optimal price. But the simplicity ends there. Every transaction, every cent matters. A ten-cent difference in each transaction translates to a staggering $657 million annually. Uber’s prowess as a transportation, logistics and analytics company hinges on their ability to leverage data effectively.

The pursuit of hyperscale analytics

The scale of Uber’s analytical endeavor requires careful selection of data platforms with high regard for limitless analytical processing. Consider the magnitude of Uber’s footprint.1 The company operates in more than 10,000 cities with more than 18 million trips per day. To maintain analytical superiority, Uber keeps 256 petabytes of data in store and processes 35 petabytes of data every day. They support 12,000 monthly active users of analytics running more than 500,000 queries every single day.

To power this mammoth analytical undertaking, Uber chose the open source Presto distributed query engine. Teams at Facebook developed Presto to handle high numbers of concurrent queries on petabytes of data and designed it to scale up to exabytes of data. Presto was able to achieve this level of scalability by completely separating analytical compute from data storage. This allowed them to focus on SQL-based query optimization to the nth degree.

What is Presto?

Presto is an open source distributed SQL query engine for data analytics and the data lakehouse, designed for running interactive analytic queries against datasets of all sizes, from gigabytes to petabytes. It excels in scalability and supports a wide range of analytical use cases. Presto’s cost-based query optimizer, dynamic filtering and extensibility through user-defined functions make it a versatile tool in Uber’s analytics arsenal. To achieve maximum scalability and support a broad range of analytical use cases, Presto separates analytical processing from data storage. When a query is constructed, it passes through a cost-based optimizer, then data is accessed through connectors, cached for performance and analyzed across a series of servers in a cluster. Because of its distributed nature, Presto scales for petabytes and exabytes of data.

The evolution of Presto at Uber

Beginning of a data analytics journey

Uber began their analytical journey with a traditional analytical database platform at the core of their analytics. However, as their business grew, so did the amount of data they needed to process and the number of insight-driven decisions they needed to make. The cost and constraints of traditional analytics soon reached their limit, forcing Uber to look elsewhere for a solution.

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. While this side-by-side strategy enabled data capture, they quickly discovered that the data lake worked well for long-running queries, but it was not fast enough to support the near-real time engagement necessary to maintain a competitive advantage.

To address their performance needs, Uber chose Presto because of its ability, as a distributed platform, to scale in linear fashion and because of its commitment to ANSI-SQL, the lingua franca of analytical processing. They set up a couple of clusters and began processing queries at a much faster speed than anything they had experienced with Apache Hive, a distributed data warehouse system, on their data lake.

Continued high growth

As the use of Presto continued to grow, Uber joined the Presto Foundation, the neutral governing body behind the Presto open source project, as a founding member alongside Facebook. Their initial contributions were based on their need for growth and scalability. Uber focused on contributing to several key areas within Presto:

Automation: To support growing usage, the Uber team went to work on automating cluster management to make it simple to keep up and running. Automation enabled Uber to grow to their current state with more than 256 petabytes of data, 3,000 nodes and 12 clusters. They also put process automation in place to quickly set up and take down clusters.

Workload Management: Because different kinds of queries have different requirements, Uber made sure that traffic is well-isolated. This enables them to batch queries based on speed or accuracy. They have even created subcategories for a more granular approach to workload management.

Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data. Large, untested workloads run the risk of hogging all the resources. In some cases, the queries run out of memory and do not complete.

To address this challenge, Uber created and maintains sample versions of datasets. If they know a certain user is doing exploratory work, they simply route them to the sampled datasets. This way, the queries run much faster. There may be inaccuracy because of sampling, but it allows users to discover new viewpoints within the data. If the exploratory work needs to move on to testing and production, they can plan appropriately.

Security: Uber adapted Presto to take users’ credentials and pass them down to the storage layer, specifying the precise data to which each user has access permissions. As Uber has done with many of its additions to Presto, they contributed their security upgrades back to the open source Presto project.

The technical value of Presto at Uber

Analyzing complex data types with Presto

As a digital native company, Uber continues to expand its use cases for Presto. For traditional analytics, they are bringing data discipline to their use of Presto. They ingest data in snapshots from operational systems. It lands as raw data in HDFS. Next, they build model data sets out of the snapshots, cleanse and deduplicate the data, and prepare it for analysis as Parquet files.

For more complex data types, Uber uses Presto’s complex SQL features and functions, especially when dealing with nested or repeated data, time-series data or data types like maps, arrays, structs and JSON. Presto also applies dynamic filtering that can significantly improve the performance of queries with selective joins by avoiding reading data that would be filtered by join conditions. For example, a parquet file can store data as BLOBS within a column. Uber users can run a Presto query that extracts a JSON file and filters out the data specified by the query. The caveat is that doing this defeats the purpose of the columnar state of a JSON file. It is a quick way to do the analysis, but it does sacrifice some performance.

Extending the analytical capabilities and use cases of Presto

To extend the analytical capabilities of Presto, Uber uses many out-of-the-box functions provided with the open source software. Presto provides a long list of functions, operators, and expressions as part of its open source offering, including standard functions, maps, arrays, mathematical, and statistical functions. In addition, Presto also makes it easy for Uber to define their own functions. For example, tied closely to their digital business, Uber has created their own geospatial functions.

Uber chose Presto for the flexibility it provides with compute separated from data storage. As a result, they continue to expand their use cases to include ETL, data science, data exploration, online analytical processing (OLAP), data lake analytics and federated queries.

Pushing the real-time boundaries of Presto

Uber also upgraded Presto to support real-time queries and to run a single query across data in motion and data at rest. To support very low latency use cases, Uber runs Presto as a microservice on their infrastructure platform and moves transaction data from Kafka into Apache Pinot, a real-time distributed OLAP data store, used to deliver scalable, real-time analytics.

According to the Apache Pinot website, “Pinot is a distributed and scalable OLAP (Online Analytical Processing) datastore, which is designed to answer OLAP queries with low latency. It can ingest data from offline batch data sources (such as Hadoop and flat files) as well as online data sources (such as Kafka). Pinot is designed to scale horizontally, so that it can handle large amounts of data. It also provides features like indexing and caching.”

This combination supports a high volume of low-latency queries. For example, Uber has created a dashboard called Restaurant Manager in which restaurant owners can look at orders in real time as they are coming into their restaurants. Uber has made the Presto query engine connect to real-time databases.

To summarize, here are some of the key differentiators of Presto that have helped Uber:

Speed and Scalability: Presto’s ability to handle massive amounts of data and process queries at lightning speed has accelerated Uber’s analytics capabilities. This speed is essential in a fast-paced industry where real-time decision-making is paramount.

Self-Service Analytics: Presto has democratized data access at Uber, allowing data scientists, analysts and business users to run their queries without relying heavily on engineering teams. This self-service analytics approach has improved agility and decision-making across the organization.

Data Exploration and Innovation: The flexibility of Presto has encouraged data exploration and experimentation at Uber. Data professionals can easily test hypotheses and gain insights from large and diverse datasets, leading to continuous innovation and service improvement.

Operational Efficiency: Presto has played a crucial role in optimizing Uber’s operations. From route optimization to driver allocation, the ability to analyze data quickly and accurately has led to cost savings and improved user experiences.

Federated Data Access: Presto’s support for federated queries has simplified data access across Uber’s various data sources, making it easier to harness insights from multiple data stores, whether on-premises or in the cloud.

Real-Time Analytics: Uber’s integration of Presto with real-time data stores like Apache Pinot has enabled the company to provide real-time analytics to users, enhancing their ability to monitor and respond to changing conditions rapidly.

Community Contribution: Uber’s active participation in the Presto open source community has not only benefited their own use cases but has also contributed to the broader development of Presto as a powerful analytical tool for organizations worldwide.

The power of Presto in Uber’s data-driven journey

Today, Uber relies on Presto to power some impressive metrics. From their latest Presto presentation in August 2023, here’s what they shared:

Uber’s success as a data-driven company is no accident. It’s the result of a deliberate strategy to leverage cutting-edge technologies like Presto to unlock the insights hidden in vast volumes of data. Presto has become an integral part of Uber’s data ecosystem, enabling the company to process petabytes of data, support diverse analytical use cases, and make informed decisions at an unprecedented scale.

Getting started with Presto

If you’re new to Presto and want to check it out, we recommend this Getting Started page where you can try it out.

Alternatively, if you’re ready to get started with Presto in production you can check out IBM watsonx.data, a Presto-based open data lakehouse. Watsonx.data is a fit-for-purpose data store, built on an open lakehouse architecture, supported by querying, governance and open data formats to access and share data.

Request a live demo here to see Presto and watsonx.data in action

Try watsonx.data for free

1 Uber. EMA Technical Case Study, sponsored by Ahana. Enterprise Management Associates (EMA). 2023.

Chair, Presto Community Team and Community at IBM



Source link

Tags: casePowerPrestoStudyUberUnleashing
Previous Post

Hong Kong to Disclose Names Crypto License Applicants

Next Post

Sacred Ethiopian tablet looted by the British at the battle of Maqdala 155 years ago is returned in London church service

Related Posts

LINK Price Prediction: Chainlink Eyes .50 Target as Bulls Test Critical .48 Resistance
Blockchain

LINK Price Prediction: Chainlink Eyes $28.50 Target as Bulls Test Critical $26.48 Resistance

August 23, 2025
AVAX Price Prediction: Targeting  Breakout After 13% Rally Sets Stage for August Surge
Blockchain

AVAX Price Prediction: Targeting $32 Breakout After 13% Rally Sets Stage for August Surge

August 23, 2025
Townstar Introduces Gems to Tackle Spoiled Soil Challenge
Blockchain

Townstar Introduces Gems to Tackle Spoiled Soil Challenge

August 22, 2025
Interpol Busts 1,200 Cybercriminals in Global Crypto Raid
Blockchain

Interpol Busts 1,200 Cybercriminals in Global Crypto Raid

August 22, 2025
BTC Holder Loses M After Falling for Fake Support Trap
Blockchain

BTC Holder Loses $91M After Falling for Fake Support Trap

August 22, 2025
Bitcoin (BTC) 2025 Market Projections Released by Bitwise
Blockchain

Bitcoin (BTC) 2025 Market Projections Released by Bitwise

August 22, 2025
Next Post
Sacred Ethiopian tablet looted by the British at the battle of Maqdala 155 years ago is returned in London church service

Sacred Ethiopian tablet looted by the British at the battle of Maqdala 155 years ago is returned in London church service

Why A Higher XRP Price Is Beneficial For Adoption

Why A Higher XRP Price Is Beneficial For Adoption

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED

No Content Available

  • USD
  • EUR
  • GBP
  • AUD
  • JPY
  • bitcoinBitcoin(BTC)
    $63,879.000.40%
  • ethereumEthereum(ETH)
    $1,676.700.36%
  • tetherTether(USDT)
    $1.000.07%
  • binancecoinBNB(BNB)
    $605.85-0.01%
  • usd-coinUSDC(USDC)
    $1.000.02%
  • rippleXRP(XRP)
    $1.150.54%
  • solanaSolana(SOL)
    $67.831.50%
  • tronTRON(TRX)
    $0.3165621.48%
  • Figure HelocFigure Heloc(FIGR_HELOC)
    $1.030.07%
  • dogecoinDogecoin(DOGE)
    $0.0878161.57%
  • Trending
  • Comments
  • Latest
4 Expert Tips to Turn Blank Pages Into Business Blueprints

4 Expert Tips to Turn Blank Pages Into Business Blueprints

October 21, 2024
Top Crypto Portfolio Rebalancing Tools (Automated & Manual)

Top Crypto Portfolio Rebalancing Tools (Automated & Manual)

April 13, 2025
What are Meta Transactions? Exploring ERC-2771

What are Meta Transactions? Exploring ERC-2771

October 25, 2023
Uniswap v4 Teases Major Updates for 2025

Uniswap v4 Teases Major Updates for 2025

January 2, 2025
How to Set Up NFT Sales Notifications

How to Set Up NFT Sales Notifications

October 19, 2023
A 98% Crash and a Pump & Dump

A 98% Crash and a Pump & Dump

August 8, 2025
AI Expert: Truth Protocols Could Become the SSL of the Information Age

AI Expert: Truth Protocols Could Become the SSL of the Information Age

August 24, 2025
Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means

Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means

August 24, 2025
Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse

Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse

August 24, 2025
Ethereum’s Tech Edge Could Outshine Bitcoin — Here’s How

Ethereum’s Tech Edge Could Outshine Bitcoin — Here’s How

August 23, 2025
IRS Loses Top Crypto Enforcer After Only 90 Days on the Job

IRS Loses Top Crypto Enforcer After Only 90 Days on the Job

August 23, 2025
US Court Grants Stay In Coinbase Biometric Data Lawsuit — Details

US Court Grants Stay In Coinbase Biometric Data Lawsuit — Details

August 23, 2025
Crypeto News

Find the latest Bitcoin, Ethereum, blockchain, crypto, Business, Fintech News, interviews, and price analysis at Crypeto News.

CATEGORIES

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • Mining
  • NFT
  • Regulations
  • Scam Alert
  • Uncategorized
  • Videos
  • Web3

LATEST UPDATES

  • AI Expert: Truth Protocols Could Become the SSL of the Information Age
  • Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means
  • Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
  • About Us

Copyright © 2022 Crypeto News.
Crypeto News is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos

Copyright © 2022 Crypeto News.
Crypeto News is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In