Wednesday, June 17, 2026
No Result
View All Result
Crypeto News
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos
CRYPTO MARKETCAP
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos
CRYPTO MARKETCAP
Crypeto News
No Result
View All Result

IBM Research Unveils Cost-Effective AI Inferencing with Speculative Decoding

by crypetonews
June 24, 2024
in Blockchain
Reading Time: 2 mins read
0 0
A A
0
Home Blockchain
Share on FacebookShare on Twitter







IBM Research has announced a significant breakthrough in AI inferencing, combining speculative decoding with paged attention to enhance the cost performance of large language models (LLMs). This development promises to make customer care chatbots more efficient and cost-effective, according to IBM Research.

In recent years, LLMs have improved the ability of chatbots to understand customer queries and provide accurate responses. However, the high cost and slow speed of serving these models have hindered broader AI adoption. Speculative decoding emerges as an optimization technique to accelerate AI inferencing by generating tokens faster, which can reduce latency by two to three times, thereby improving customer experience.

Despite its advantages, reducing latency traditionally comes with a trade-off: decreased throughput, or the number of users that can simultaneously utilize the model, which increases operational costs. IBM Research has tackled this challenge by cutting the latency of its open-source Granite 20B code model in half while quadrupling its throughput.

Speculative Decoding: Efficiency in Token Generation

LLMs use a transformer architecture, which is inefficient at generating text. Typically, a forward pass is required to process each previously generated token before producing a new one. Speculative decoding modifies this process to evaluate several prospective tokens simultaneously. If these tokens are validated, one forward pass can generate multiple tokens, thus increasing inferencing speed.

This technique can be executed by a smaller, more efficient model or part of the main model itself. By processing tokens in parallel, speculative decoding maximizes the efficiency of each GPU, potentially doubling or tripling inferencing speed. Initial introductions of speculative decoding by DeepMind and Google researchers utilized a draft model, while newer methods, such as the Medusa speculator, eliminate the need for a secondary model.

IBM researchers adapted the Medusa speculator by conditioning future tokens on each other rather than on the model’s next predicted token. This approach, combined with an efficient fine-tuning method using small and large batches of text, aligns the speculator’s responses closely with the LLM, significantly boosting inferencing speeds.

Paged Attention: Optimizing Memory Usage

Reducing LLM latency often compromises throughput due to increased GPU memory strain. Dynamic batching can mitigate this but not when speculative decoding is also competing for memory. IBM researchers addressed this by employing paged attention, an optimization technique inspired by virtual memory and paging concepts from operating systems.

Traditional attention algorithms store key-value (KV) sequences in contiguous memory, leading to fragmentation. Paged attention, however, divides these sequences into smaller blocks, or pages, that can be accessed as needed. This method minimizes redundant computation and allows the speculator to generate multiple candidates for each predicted word without duplicating the entire KV-cache, thus freeing up memory.

Future Implications

IBM has integrated speculative decoding and paged attention into its Granite 20B code model. The IBM speculator has been open-sourced on Hugging Face, enabling other developers to adapt these techniques for their LLMs. IBM plans to implement these optimization techniques across all models on its watsonx platform, enhancing enterprise AI applications.

Image source: Shutterstock



Source link

Tags: CostEffectiveDecodingIBMInferencingResearchSpeculativeUnveils
Previous Post

Ethereum Set For $5,000? ETH Open Interest Expanding On CME Ahead Of Spot ETFs Trading

Next Post

Sealana ICO Ends in 24 Hours After Raising Over $5 Million – Solana’s Next Top Meme Coin?

Related Posts

LINK Price Prediction: Chainlink Eyes .50 Target as Bulls Test Critical .48 Resistance
Blockchain

LINK Price Prediction: Chainlink Eyes $28.50 Target as Bulls Test Critical $26.48 Resistance

August 23, 2025
AVAX Price Prediction: Targeting  Breakout After 13% Rally Sets Stage for August Surge
Blockchain

AVAX Price Prediction: Targeting $32 Breakout After 13% Rally Sets Stage for August Surge

August 23, 2025
Townstar Introduces Gems to Tackle Spoiled Soil Challenge
Blockchain

Townstar Introduces Gems to Tackle Spoiled Soil Challenge

August 22, 2025
Interpol Busts 1,200 Cybercriminals in Global Crypto Raid
Blockchain

Interpol Busts 1,200 Cybercriminals in Global Crypto Raid

August 22, 2025
BTC Holder Loses M After Falling for Fake Support Trap
Blockchain

BTC Holder Loses $91M After Falling for Fake Support Trap

August 22, 2025
Bitcoin (BTC) 2025 Market Projections Released by Bitwise
Blockchain

Bitcoin (BTC) 2025 Market Projections Released by Bitwise

August 22, 2025
Next Post
Sealana ICO Ends in 24 Hours After Raising Over  Million – Solana’s Next Top Meme Coin?

Sealana ICO Ends in 24 Hours After Raising Over $5 Million – Solana’s Next Top Meme Coin?

What’s Next for Bitcoin Price (BTC) After Mt. Gox News

What's Next for Bitcoin Price (BTC) After Mt. Gox News

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED

No Content Available

  • USD
  • EUR
  • GBP
  • AUD
  • JPY
  • bitcoinBitcoin(BTC)
    $65,755.00-0.48%
  • ethereumEthereum(ETH)
    $1,792.951.74%
  • tetherTether(USDT)
    $1.00-0.03%
  • binancecoinBNB(BNB)
    $607.82-0.98%
  • rippleXRP(XRP)
    $1.22-0.64%
  • usd-coinUSDC(USDC)
    $1.000.00%
  • solanaSolana(SOL)
    $73.67-0.07%
  • tronTRON(TRX)
    $0.3177660.01%
  • Figure HelocFigure Heloc(FIGR_HELOC)
    $1.040.77%
  • HyperliquidHyperliquid(HYPE)
    $74.583.50%
  • Trending
  • Comments
  • Latest
4 Expert Tips to Turn Blank Pages Into Business Blueprints

4 Expert Tips to Turn Blank Pages Into Business Blueprints

October 21, 2024
Top Crypto Portfolio Rebalancing Tools (Automated & Manual)

Top Crypto Portfolio Rebalancing Tools (Automated & Manual)

April 13, 2025
What are Meta Transactions? Exploring ERC-2771

What are Meta Transactions? Exploring ERC-2771

October 25, 2023
How to Set Up NFT Sales Notifications

How to Set Up NFT Sales Notifications

October 19, 2023
Uniswap v4 Teases Major Updates for 2025

Uniswap v4 Teases Major Updates for 2025

January 2, 2025
A 98% Crash and a Pump & Dump

A 98% Crash and a Pump & Dump

August 8, 2025
AI Expert: Truth Protocols Could Become the SSL of the Information Age

AI Expert: Truth Protocols Could Become the SSL of the Information Age

August 24, 2025
Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means

Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means

August 24, 2025
Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse

Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse

August 24, 2025
Ethereum’s Tech Edge Could Outshine Bitcoin — Here’s How

Ethereum’s Tech Edge Could Outshine Bitcoin — Here’s How

August 23, 2025
IRS Loses Top Crypto Enforcer After Only 90 Days on the Job

IRS Loses Top Crypto Enforcer After Only 90 Days on the Job

August 23, 2025
US Court Grants Stay In Coinbase Biometric Data Lawsuit — Details

US Court Grants Stay In Coinbase Biometric Data Lawsuit — Details

August 23, 2025
Crypeto News

Find the latest Bitcoin, Ethereum, blockchain, crypto, Business, Fintech News, interviews, and price analysis at Crypeto News.

CATEGORIES

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • Mining
  • NFT
  • Regulations
  • Scam Alert
  • Uncategorized
  • Videos
  • Web3

LATEST UPDATES

  • AI Expert: Truth Protocols Could Become the SSL of the Information Age
  • Analyst Says Dogecoin Price Is Entering Expansion Phase, Here’s What It Means
  • Robert Kiyosaki Exposes Brutal Truth Behind Sudden Wealth and Collapse
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us
  • About Us

Copyright © 2022 Crypeto News.
Crypeto News is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Blockchain
    • Ethereum
    • Altcoin
    • Mining
    • Crypto Exchanges
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
  • Videos

Copyright © 2022 Crypeto News.
Crypeto News is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In