We pre-trained the $\text{BERT}$ models, obtaining the following steps and time, and model performance.
Language | Steps | Time (days) | $MCM_{PT}$ (%) | $NLP_{PT}$ (%) |
---|---|---|---|---|
JavaScript | 32,3665 | 4 | 90 | 98 |
Java | 190,662 | 2.4 | 87 | 96 |
Python | 127,339 | 1.6 | 86 | 98 |
TOP | 641,725 | 8 | 88 | 95 |
ALL | 883,468 | 11 | 88 | 95 |
We share the pre-trained $\text{BERT}$ models to be used as encoders for other downstream tasks.
Our models are compatible with the source code for $\text{BERT}$ by Google, whose source code can be found at https://github.com/google-research/bert.
Every archive file is split into partial files. You can obtain a single file by executing the command:
cat bertc-js.tar.xz.part.* > bertc-js.tar.xz