Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
arXiv | May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
arXiv | May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
arXiv | May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
arXiv | May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
arXiv | May 2026