Apple DCLM-7B Model

Apple DCLM-7B Model

Apple has officially entered the language model landscape with the release of a DCML-7B open-source language model, including weights, training code, and dataset.

Key Highlights

  • Model Specifications: The 7B base model is trained on 2.5 trillion tokens, using primarily English data with a 2048 context window.
  • Training Data: Combines datasets from DCLM-BASELINE, StarCoder, and ProofPile2.
  • Performance: The model achieves an MMLU score of 0.6372, positioning it above Mistral but below Llama3 in performance.
  • License: Released under an open license, specifically the Apple Sample Code License.
  • Comparison: Matches the performance of closed-dataset models like Mistral.
  • Training Framework: Developed using PyTorch and the OpenLM framework.
  • Availability: The model is accessible on Hugging Face and integrated within Transformers.
ModelParamsTokensOpen dataset?COREMMLUEXTENDED
Open weights, closed datasets
Llama27B2T49.245.834.1
DeepSeek7B2T50.748.535.3
Mistral-0.37B?57.062.745.1
QWEN-27B?57.571.950.5
Llama38B15T57.666.246.3
Gemma8B6T57.864.344.6
Phi-37B?61.069.957.9
Open weights, open datasets
Falcon7B1T44.127.425.1
OLMo-1.77B2.1T47.054.034.2
MAP-Neo7B4.5T50.257.140.4
DCLM-7B7B2.5T56.163.743.6
Comparisions of DCLM-7B model with other models in the 7B regime.

Model Card for DCLM-Baseline-7B

DCLM-Baseline-7B is a language model with 7 billion parameters, trained on the DCLM-Baseline dataset, which is part of the DataComp for Language Models (DCLM) benchmark. This model aims to demonstrate the benefits of systematic data curation techniques in enhancing language model performance.

Model Details

SizeTraining TokensLayersHidden SizeAttention HeadsContext Length
7B2.5T324096322048

Model Sources

This release marks a significant step for Apple, contributing to the open-source AI community and providing developers with robust tools for natural language processing tasks.

Read related articles:


Posted

in

by

Tags: