We produced the Multimodal Embedding Model (MEM) architecture, starting from the one proposed by Husain et al. for the CodeSearchNet challenge.

We publicly release the source code:
src-finetuning.tar.xzThe following table shows the fine-tuning hyperparameters compared to the ones used by Husain et al.
| Parameter | Husain et al. | Our approach |
|---|---|---|
| Learning rate | 0.0005 | 0.0005 |
| Learning rate decay | 0.98 | 0.98 |
| Momentum | 0.85 | 0.85 |
| Dropout probability | 0.1 | 0.1 |
| Maximum sequence length (query) | 30 | 30 |
| Maximum sequence length (code) | 200 | 256 |
| Optimizer | Adam | LAMB |
| Maximum training epochs | 500 | 10 |
| Patience | 5 | 10 |
| Batch size | 450 | 32 |
Then, we report the $\text{BERT}$-specific hyperparameters that we used for both the code encoder and the query encoder.
| Parameter | Husain et al. | Our approach |
|---|---|---|
| Activation function | gelu | gelu |
| Attention heads | 8 | 8 |
| Hidden layers | 3 | 3 |
| Hidden size | 128 | 768 |
| Intermediate size | 512 | 3,072 |
| Vocabulary size | 10,000 | 30,522 |