Revisiting Transformer Layer Parameterization Through Causal Energy Minimization
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
arXiv | May 2026
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
arXiv | May 2026
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
ArXiv | October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
Neural Information Processing Systems | March 2024
Preprint
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
arXiv | May 2026
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
ArXiv | October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
Neural Information Processing Systems | March 2024
Preprint
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
arXiv | May 2026
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
ArXiv | October 2024
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
Neural Information Processing Systems | March 2024
Preprint
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024