Tabela de classificação LMSys: Qual LLM é atualmente o melhor?

“`



LMSys Large Language Model Leaderboard Analysis

Understanding the LMSys Large Language Model Leaderboard

In this post, we will discuss a fascinating website discovered by NeuralNine on their YouTube channel (video link) that ranks large language models based on their current performance.

What is LMSys Leaderboard?

LMSys leaderboard, also known as the ChatLM CIS platform, is a crowdsourced evaluation system for large language models. It functions as a leaderboard or competition platform where different LLMs are rated based on user votes.

How does it work?

The ranking of the LLMs is Elo-based, which means that the rating system follows the same principle used in chess and other games. Users can participate by providing their vote on a prompt where two different models’ outputs are presented to them without revealing which model produced what output.

The Ranking System

In the video, NeuralNine shared that at the time of recording (January 29th), the top-performing LLM was gp4 Turbo. Bart had recently made a significant jump in performance, and Mixr was the best open-source model.

Contributing to the Leaderboard

To contribute to the leaderboard, users can visit chatLM.cis.upenn.edu. They can ask a prompt and compare two outputs from different LLMs. After choosing which output is better, their vote contributes to the Elo rating of each model.

Conclusion

The LMSys leaderboard provides valuable insights into the current performance of various large language models. It can help users decide on which LLM to use for their projects or research, based on their requirements and preferences. By participating in this crowdsourced evaluation system, users contribute to a more accurate representation of each model’s capabilities.

References





“`
Note: Replace “GTM-ID” with the actual Google Tag Manager ID of the website.

O conteúdo deste post foi gerado pelo sistema de inteligência artificial da https://dicas.link
Assista o vídeo no youtube