发布于：2024-11-06更新于：2025-04-01

关注会议列表

CCF-A

USENIX ATC(USENIX Annual Technical Conference)

PUZZLE: efficiently aligning large language models through light-weight context switch

SC(International Conference for High Performance Computing, Networking, Storage, and Analysis)

ASPLOS(International Conference on Architectural Support for Programming Languages and Operating Systems)

Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow. ASPLOS (1) 2025

HPCA(IEEE International Symposium on High Performance Computer Architecture)

PPoPP(ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming)

OSDI(USENIX Symposium on Operating Systems Design and Implementation)

Orca: A Distributed Serving System for Transformer-Based Generative Models. OSDI 2022

EuroSys(European Conference on Computer Systems)

PLDI(ACM SIGPLAN Conference on Programming Language Design and Implementation)

ICML(International Conference on Machine Learning)

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty. ICML 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. ICML 2024
Fast Inference from Transformers via Speculative Decoding. ICML 2023

NeurIPS(Conference on Neural Information Processing Systems)

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices. NeurIPS 2024
Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting. NeurIPS 2024
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. NeurIPS 2023
Blockwise Parallel Decoding for Deep Autoregressive Models. NeurIPS 2018

ACL(Association for Computational Linguistics)

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding. ACL (1) 2024
BASS: Batched Attention-optimized Speculative Sampling. ACL (Findings) 2024
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL (Findings) 2024

CCF-B

ICS(International Conference on Supercomputing)

ICPP(International Conference on Parallel Processing)

CLUSTER(IEEE International Conference on Cluster Computing)

CGO(The International Symposium on Code Generation and Optimization)

IPDPS(IEEE International Parallel & Distributed Processing Symposium)

EMNLP(Conference on Empirical Methods in Natural Language Processing)

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees. EMNLP 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding. EMNLP 2024

NAACL(North American Chapter of the Association for Computational Linguistics)

REST: Retrieval-Based Speculative Decoding. NAACL-HLT 2024

CCF-C

CF(ACM International Conference on Computing Frontiers)

NPC(IFIP International Conference on Network and Parallel Computing)

ICA3PP(International Conference on Algorithms and Architectures for Parallel Processing)

ICPADS(International Conference on Parallel and Distributed Systems)

HiPC(IEEE International Conference on High Performance Computing, Data and Analytics)

HPCC(IEEE International Conference on High Performance Computing and Communications)

IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID)

未收录

MLSys(Conference on Machine Learning and Systems)

本文采用署名-非商业性使用-相同方式共享 4.0 国际许可协议，转载请注明出处。