CCF-A

USENIX ATC(USENIX Annual Technical Conference)

  • PUZZLE: efficiently aligning large language models through light-weight context switch

SC(International Conference for High Performance Computing, Networking, Storage, and Analysis)

ASPLOS(International Conference on Architectural Support for Programming Languages and Operating Systems)

  • Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow. ASPLOS (1) 2025

HPCA(IEEE International Symposium on High Performance Computer Architecture)

PPoPP(ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming)

OSDI(USENIX Symposium on Operating Systems Design and Implementation)

  • Orca: A Distributed Serving System for Transformer-Based Generative Models. OSDI 2022

EuroSys(European Conference on Computer Systems)

PLDI(ACM SIGPLAN Conference on Programming Language Design and Implementation)

ICML(International Conference on Machine Learning)

  • EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty. ICML 2024
  • Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. ICML 2024
  • Fast Inference from Transformers via Speculative Decoding. ICML 2023

NeurIPS(Conference on Neural Information Processing Systems)

  • SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices. NeurIPS 2024
  • Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting. NeurIPS 2024
  • H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. NeurIPS 2023
  • Blockwise Parallel Decoding for Deep Autoregressive Models. NeurIPS 2018

ACL(Association for Computational Linguistics)

  • LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding. ACL (1) 2024
  • BASS: Batched Attention-optimized Speculative Sampling. ACL (Findings) 2024
  • Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL (Findings) 2024

CCF-B

ICS(International Conference on Supercomputing)

ICPP(International Conference on Parallel Processing)

CLUSTER(IEEE International Conference on Cluster Computing)

CGO(The International Symposium on Code Generation and Optimization)

IPDPS(IEEE International Parallel & Distributed Processing Symposium)

EMNLP(Conference on Empirical Methods in Natural Language Processing)

  • EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees. EMNLP 2024
  • Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding. EMNLP 2024

NAACL(North American Chapter of the Association for Computational Linguistics)

  • REST: Retrieval-Based Speculative Decoding. NAACL-HLT 2024

CCF-C

CF(ACM International Conference on Computing Frontiers)

NPC(IFIP International Conference on Network and Parallel Computing)

ICA3PP(International Conference on Algorithms and Architectures for Parallel Processing)

ICPADS(International Conference on Parallel and Distributed Systems)

HiPC(IEEE International Conference on High Performance Computing, Data and Analytics)

HPCC(IEEE International Conference on High Performance Computing and Communications)

IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID)

未收录

MLSys(Conference on Machine Learning and Systems)