Experiments
To evaluate the models deriving from our approach, we applied 10-fold cross-validation to all the experiments by splitting the entire dataset into ten equal folds and using nine for training and one for testing.
We further split the data from the nine training folds into 90 % training and 10 % validation data, leaving us with the fold sizes as approximately reported in the following table.
Language | Total | Training | Validation | Test |
JavaScript | 85,049 | 68,889 | 7,654 | 8,504 |
Java | 71,194 | 57,667 | 6,407 | 7,119 |
Python | 87,231 | 70,657 | 7,850 | 8,723 |
TOP | 243,474 | 197,213 | 21,912 | 24,347 |
With the number of observations from the cross-validation, we can apply statistical tests to mitigate the risk of spurious differences.
Since some question posts on StackOverflow might be related to multiple programming languages, in order to avoid duplicates and ambiguities we removed such intersections in the case of the TOP dataset.
This cleaning operation resulted into a removal of 1,145 pairs from TOP.
The following tables describe all the experiments we ran in our evaluation.
For each of the models, we indicate a label that we use as a unique reference for that model.
Fine-tuned models
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | JavaScript | JavaScript | JavaScript | MEM–{NO+JS}–[JS]–(JS) |
None | Java | Java | Java | MEM–{NO+JA}–[JA]–(JA) |
None | Python | Python | Python | MEM–{NO+PY}–[PY]–(PY) |
None | TOP | JavaScript | JavaScript | MEM–{NO+TP}–[JS]–(JS) |
None | TOP | Java | Java | MEM–{NO+TP}–[JA]–(JA) |
None | TOP | Python | Python | MEM–{NO+TP}–[PY]–(PY) |
None | TOP | TOP | TOP | MEM–{NO+TP}–[TP]–(TP) |
None | ALL | JavaScript | JavaScript | MEM–{NO+AL}–[JS]–(JS) |
None | ALL | Java | Java | MEM–{NO+AL}–[JA]–(JA) |
None | ALL | Python | Python | MEM–{NO+AL}–[PY]–(PY) |
None | ALL | TOP | TOP | MEM–{NO+AL}–[TP]–(TP) |
English | None | JavaScript | JavaScript | MEM–{EN+NO}–[JS]–(JS) |
English | None | Java | Java | MEM–{EN+NO}–[JA]–(PY) |
English | None | Python | Python | MEM–{EN+NO}–[PY]–(PY) |
English | None | TOP | TOP | MEM–{EN+NO}–[TP]–(TP) |
English | JavaScript | JavaScript | JavaScript | MEM–{EN+JS}–[JS]–(JS) |
English | Java | Java | Java | MEM–{EN+JA}–[JA]–(JA) |
English | Python | Python | Python | MEM–{EN+PY}–[PY]–(PY) |
English | TOP | JavaScript | JavaScript | MEM–{EN+TP}–[JS]–(JS) |
English | TOP | Java | Java | MEM–{EN+TP}–[JA]–(JA) |
English | TOP | Python | Python | MEM–{EN+TP}–[PY]–(PY) |
English | TOP | TOP | TOP | MEM–{EN+TP}–[TP]–(TP) |
English | ALL | JavaScript | JavaScript | MEM–{EN+AL}–[JS]–(JS) |
English | ALL | Java | Java | MEM–{EN+AL}–[JA]–(JA) |
English | ALL | Python | Python | MEM–{EN+AL}–[PY]–(PY) |
English | ALL | TOP | TOP | MEM–{EN+AL}–[TP]–(TP) |
Baselines
Random
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | None | JavaScript | MEM–{NO+NO}–[NO]–(JS) |
None | None | None | Java | MEM–{NO+NO}–[NO]–(JA) |
None | None | None | Python | MEM–{NO+NO}–[NO]–(PY) |
None | None | None | TOP | MEM–{NO+NO}–[NO]–(TP) |
Zero-shot
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | JavaScript | None | JavaScript | MEM–{NO+JS}–[NO]–(JS) |
None | Java | None | Java | MEM–{NO+JA}–[NO]–(JA) |
None | Python | None | Python | MEM–{NO+PY}–[NO]–(PY) |
None | TOP | None | JavaScript | MEM–{NO+TP}–[NO]–(JS) |
None | TOP | None | Java | MEM–{NO+TP}–[NO]–(JA) |
None | TOP | None | Python | MEM–{NO+TP}–[NO]–(PY) |
None | TOP | None | TOP | MEM–{NO+TP}–[NO]–(TP) |
None | ALL | None | JavaScript | MEM–{NO+AL}–[NO]–(JS) |
None | ALL | None | Java | MEM–{NO+AL}–[NO]–(JA) |
None | ALL | None | Python | MEM–{NO+AL}–[NO]–(PY) |
None | ALL | None | TOP | MEM–{NO+AL}–[NO]–(TP) |
English | None | None | JavaScript | MEM–{EN+NO}–[NO]–(JS) |
English | None | None | Java | MEM–{EN+NO}–[NO]–(PY) |
English | None | None | Python | MEM–{EN+NO}–[NO]–(PY) |
English | None | None | TOP | MEM–{EN+NO}–[NO]–(TP) |
English | JavaScript | None | JavaScript | MEM–{EN+JS}–[NO]–(JS) |
English | Java | None | Java | MEM–{EN+JA}–[NO]–(JA) |
English | Python | None | Python | MEM–{EN+PY}–[NO]–(PY) |
English | TOP | None | JavaScript | MEM–{EN+TP}–[NO]–(JS) |
English | TOP | None | Java | MEM–{EN+TP}–[NO]–(JA) |
English | TOP | None | Python | MEM–{EN+TP}–[NO]–(PY) |
English | TOP | None | TOP | MEM–{EN+TP}–[NO]–(TP) |
English | ALL | None | JavaScript | MEM–{EN+AL}–[NO]–(JS) |
English | ALL | None | Java | MEM–{EN+AL}–[NO]–(JA) |
English | ALL | None | Python | MEM–{EN+AL}–[NO]–(PY) |
English | ALL | None | TOP | MEM–{EN+AL}–[NO]–(TP) |
No pre-train
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | JavaScript | JavaScript | MEM–{NO+NO}–[JS]–(JS) |
None | None | Java | Java | MEM–{NO+NO}–[JA]–(JA) |
None | None | Python | Python | MEM–{NO+NO}–[PY]–(PY) |
None | None | TOP | TOP | MEM–{NO+NO}–[TP]–(TP) |
Lucene v8.6.1
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | None | JavaScript | LU–(JS) |
None | None | None | Java | LU–(JA) |
None | None | None | Python | LU–(PY) |
None | None | None | TOP | LU–(TP) |
DeepCS
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | JavaScript | JavaScript | DC–[JS]–(JS) |
None | None | Java | Java | DC–[JA]–(JA) |
None | None | Python | Python | DC–[PY]–(PY) |
None | None | TOP | TOP | DC–[TP]–(TP) |
Combined models
Fine-tuned models
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | JavaScript | JavaScript | JavaScript | LUMEM–{NO+JS}–[JS]–(JS) |
None | Java | Java | Java | LUMEM–{NO+JA}–[JA]–(JA) |
None | Python | Python | Python | LUMEM–{NO+PY}–[PY]–(PY) |
None | TOP | JavaScript | JavaScript | LUMEM–{NO+TP}–[JS]–(JS) |
None | TOP | Java | Java | LUMEM–{NO+TP}–[JA]–(JA) |
None | TOP | Python | Python | LUMEM–{NO+TP}–[PY]–(PY) |
None | TOP | TOP | TOP | LUMEM–{NO+TP}–[TP]–(TP) |
None | ALL | JavaScript | JavaScript | LUMEM–{NO+AL}–[JS]–(JS) |
None | ALL | Java | Java | LUMEM–{NO+AL}–[JA]–(JA) |
None | ALL | Python | Python | LUMEM–{NO+AL}–[PY]–(PY) |
None | ALL | TOP | TOP | LUMEM–{NO+AL}–[TP]–(TP) |
English | None | JavaScript | JavaScript | LUMEM–{EN+NO}–[JS]–(JS) |
English | None | Java | Java | LUMEM–{EN+NO}–[JA]–(PY) |
English | None | Python | Python | LUMEM–{EN+NO}–[PY]–(PY) |
English | None | TOP | TOP | LUMEM–{EN+NO}–[TP]–(TP) |
English | JavaScript | JavaScript | JavaScript | LUMEM–{EN+JS}–[JS]–(JS) |
English | Java | Java | Java | LUMEM–{EN+JA}–[JA]–(JA) |
English | Python | Python | Python | LUMEM–{EN+PY}–[PY]–(PY) |
English | TOP | JavaScript | JavaScript | LUMEM–{EN+TP}–[JS]–(JS) |
English | TOP | Java | Java | LUMEM–{EN+TP}–[JA]–(JA) |
English | TOP | Python | Python | LUMEM–{EN+TP}–[PY]–(PY) |
English | TOP | TOP | TOP | LUMEM–{EN+TP}–[TP]–(TP) |
English | ALL | JavaScript | JavaScript | LUMEM–{EN+AL}–[JS]–(JS) |
English | ALL | Java | Java | LUMEM–{EN+AL}–[JA]–(JA) |
English | ALL | Python | Python | LUMEM–{EN+AL}–[PY]–(PY) |
English | ALL | TOP | TOP | LUMEM–{EN+AL}–[TP]–(TP) |
Random
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | None | JavaScript | LUMEM–{NO+NO}–[NO]–(JS) |
None | None | None | Java | LUMEM–{NO+NO}–[NO]–(JA) |
None | None | None | Python | LUMEM–{NO+NO}–[NO]–(PY) |
None | None | None | TOP | LUMEM–{NO+NO}–[NO]–(TP) |
Zero-shot
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | JavaScript | None | JavaScript | LUMEM–{NO+JS}–[NO]–(JS) |
None | Java | None | Java | LUMEM–{NO+JA}–[NO]–(JA) |
None | Python | None | Python | LUMEM–{NO+PY}–[NO]–(PY) |
None | TOP | None | JavaScript | LUMEM–{NO+TP}–[NO]–(JS) |
None | TOP | None | Java | LUMEM–{NO+TP}–[NO]–(JA) |
None | TOP | None | Python | LUMEM–{NO+TP}–[NO]–(PY) |
None | TOP | None | TOP | LUMEM–{NO+TP}–[NO]–(TP) |
None | ALL | None | JavaScript | LUMEM–{NO+AL}–[NO]–(JS) |
None | ALL | None | Java | LUMEM–{NO+AL}–[NO]–(JA) |
None | ALL | None | Python | LUMEM–{NO+AL}–[NO]–(PY) |
None | ALL | None | TOP | LUMEM–{NO+AL}–[NO]–(TP) |
English | None | None | JavaScript | LUMEM–{EN+NO}–[NO]–(JS) |
English | None | None | Java | LUMEM–{EN+NO}–[NO]–(PY) |
English | None | None | Python | LUMEM–{EN+NO}–[NO]–(PY) |
English | None | None | TOP | LUMEM–{EN+NO}–[NO]–(TP) |
English | JavaScript | None | JavaScript | LUMEM–{EN+JS}–[NO]–(JS) |
English | Java | None | Java | LUMEM–{EN+JA}–[NO]–(JA) |
English | Python | None | Python | LUMEM–{EN+PY}–[NO]–(PY) |
English | TOP | None | JavaScript | LUMEM–{EN+TP}–[NO]–(JS) |
English | TOP | None | Java | LUMEM–{EN+TP}–[NO]–(JA) |
English | TOP | None | Python | LUMEM–{EN+TP}–[NO]–(PY) |
English | TOP | None | TOP | LUMEM–{EN+TP}–[NO]–(TP) |
English | ALL | None | JavaScript | LUMEM–{EN+AL}–[NO]–(JS) |
English | ALL | None | Java | LUMEM–{EN+AL}–[NO]–(JA) |
English | ALL | None | Python | LUMEM–{EN+AL}–[NO]–(PY) |
English | ALL | None | TOP | LUMEM–{EN+AL}–[NO]–(TP) |
No pre-train
Pre-training $E_q$ | Pre-training $E_c$ | Training | Test | Label |
None | None | JavaScript | JavaScript | LUMEM–{NO+NO}–[JS]–(JS) |
None | None | Java | Java | LUMEM–{NO+NO}–[JA]–(JA) |
None | None | Python | Python | LUMEM–{NO+NO}–[PY]–(PY) |
None | None | TOP | TOP | LUMEM–{NO+NO}–[TP]–(TP) |