LLaMA, a collection of foundation language models with 7B to 65B parameters, has been released by Meta AI.
Because it is intended for research communities with limited access to infrastructure, the model, also known as LLaMA (Long Language Model Meta AI), is smaller than its competitors. Sizes of LLaMA range from seven billion to sixty-five billion parameters.
While LLaMA-65B is equivalent to DeepMind’s Chinchilla-70B and Google’s PaLM-540B, LLaMA-13B is almost ten times smaller than OpenAI’s GPT-3 (175B).
The study differs from others in that it demonstrates that cutting-edge performance may be achieved by training on only freely accessible data and excluding the usage of proprietary datasets. Smaller models that have trained on more tokens (word fragments) are more adaptable and simpler to change for new commercial applications. The largest models, LLaMA 65B and LLaMA 33B, were trained using 1.4 trillion tokens, while LLaMA 7B was trained using one trillion tokens.
Like all LLMs, LLaMA creates text in a loop by accepting a string of words as input and predicting the next word. The group concentrated on languages that utilize the Latin and Cyrillic alphabets when training the model with material from the top 20 languages.
Large language models are difficult to access because they are so massive, according to researchers at Meta.
“It is more challenging for academics to comprehend how and why these massive language models operate as a result of this access restriction. It has held down efforts to improve their dependability and address issues like prejudice, toxicity, and the potential for them to disseminate misleading information “says Meta.
By shrinking the models and making them available under a non-commercial license, Meta is attempting to make LLaMA more widely available.
LLaMA models will be made available to academic researchers from governments, public organizations, and institutions on a case-by-case basis. You can submit an application to use LLaMA here.
Like ChatGPT and other language models, LLaMA requires assistance in dealing with rude remarks and odd responses. Researchers can “more easily explore novel techniques to minimize or get rid of these difficulties in large language models” if they share the model, according to Meta’s announcement of LLaMA, which acknowledges this.
The research group at Meta also published a set of benchmark-based assessments of model biases and toxicity. It was done to highlight the model’s shortcomings and to promote additional study in this important field.