Efficient LLM - a danielz01 Collection

danielz01 's Collections

Image Generation

Efficient LLM

updated Sep 20, 2024

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 38
S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 30
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Paper • 2311.06243 • Published Nov 10, 2023 • 21
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Paper • 2311.05908 • Published Nov 10, 2023 • 14
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying

Paper • 2311.09578 • Published Nov 16, 2023 • 16
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

Paper • 2311.10126 • Published Nov 16, 2023 • 9
SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 40
A Survey of Resource-efficient LLM and Multimodal Foundation Models

Paper • 2401.08092 • Published Jan 16, 2024 • 3
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26, 2024 • 20
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18, 2024 • 15
A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18, 2024 • 45