My benchmark for large language models
Date : 2024-02-19
Description
This summary was drafted with mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
This collection of tests is derived from real-life conversations he had with different LLMs. The benchmark includes tasks such as converting Python functions to equivalent but faster C functions, explaining the functionality of minified JavaScript, identifying data encoding formats, writing parsers from BNF-like grammars, converting English sentences to SQL queries, and writing bash one-liners. Carlini emphasizes the use of a simple dataflow domain-specific language (DSL) that facilitates adding new tests and realistically evaluating model capabilities.
Read article here
Recently on :
Artificial Intelligence
PITTI - 2024-09-19
A bubble in AI?
Bubble or true technological revolution? While the path forward isn't without obstacles, the value being created by AI extends ...
PITTI - 2024-09-08
Artificial Intelligence : what everyone can agree on
Artificial Intelligence is a divisive subject that sparks numerous debates about both its potential and its limitations. Howeve...
WEB - 2024-03-04
Nvidia bans using translation layers for CUDA software | Tom's Hardware
Tom's Hardware - Nvidia has banned running CUDA-based software on other hardware platforms using translation layers in its lice...
WEB - 2024-02-21
Retell AI : conversational speech engine
Retell tackle the challenge of real time conversations with voice AI.
WEB - 2024-02-21
Groq Inference Tokenomics: Speed, But At What Cost? | Semianalysis
Semianalysis - Groq, an AI hardware startup, has been making waves with their impressive demos showcasing Mistral Mixtral 8x7b ...