Experiments

To evaluate the models deriving from our approach, we applied 10-fold cross-validation to all the experiments by splitting the entire dataset into ten equal folds and using nine for training and one for testing. We further split the data from the nine training folds into 90 % training and 10 % validation data, leaving us with the fold sizes as approximately reported in the following table.

LanguageTotalTrainingValidationTest
JavaScript85,04968,8897,6548,504
Java71,19457,6676,4077,119
Python87,23170,6577,8508,723
TOP243,474197,21321,91224,347

With the number of observations from the cross-validation, we can apply statistical tests to mitigate the risk of spurious differences.

Since some question posts on StackOverflow might be related to multiple programming languages, in order to avoid duplicates and ambiguities we removed such intersections in the case of the TOP dataset. This cleaning operation resulted into a removal of 1,145 pairs from TOP.

The following tables describe all the experiments we ran in our evaluation. For each of the models, we indicate a label that we use as a unique reference for that model.

Fine-tuned models

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneJavaScriptJavaScriptJavaScriptMEM–{NO+JS}–[JS]–(JS)
NoneJavaJavaJavaMEM–{NO+JA}–[JA]–(JA)
NonePythonPythonPythonMEM–{NO+PY}–[PY]–(PY)
NoneTOPJavaScriptJavaScriptMEM–{NO+TP}–[JS]–(JS)
NoneTOPJavaJavaMEM–{NO+TP}–[JA]–(JA)
NoneTOPPythonPythonMEM–{NO+TP}–[PY]–(PY)
NoneTOPTOPTOPMEM–{NO+TP}–[TP]–(TP)
NoneALLJavaScriptJavaScriptMEM–{NO+AL}–[JS]–(JS)
NoneALLJavaJavaMEM–{NO+AL}–[JA]–(JA)
NoneALLPythonPythonMEM–{NO+AL}–[PY]–(PY)
NoneALLTOPTOPMEM–{NO+AL}–[TP]–(TP)
EnglishNoneJavaScriptJavaScriptMEM–{EN+NO}–[JS]–(JS)
EnglishNoneJavaJavaMEM–{EN+NO}–[JA]–(PY)
EnglishNonePythonPythonMEM–{EN+NO}–[PY]–(PY)
EnglishNoneTOPTOPMEM–{EN+NO}–[TP]–(TP)
EnglishJavaScriptJavaScriptJavaScriptMEM–{EN+JS}–[JS]–(JS)
EnglishJavaJavaJavaMEM–{EN+JA}–[JA]–(JA)
EnglishPythonPythonPythonMEM–{EN+PY}–[PY]–(PY)
EnglishTOPJavaScriptJavaScriptMEM–{EN+TP}–[JS]–(JS)
EnglishTOPJavaJavaMEM–{EN+TP}–[JA]–(JA)
EnglishTOPPythonPythonMEM–{EN+TP}–[PY]–(PY)
EnglishTOPTOPTOPMEM–{EN+TP}–[TP]–(TP)
EnglishALLJavaScriptJavaScriptMEM–{EN+AL}–[JS]–(JS)
EnglishALLJavaJavaMEM–{EN+AL}–[JA]–(JA)
EnglishALLPythonPythonMEM–{EN+AL}–[PY]–(PY)
EnglishALLTOPTOPMEM–{EN+AL}–[TP]–(TP)

Baselines

Random

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneNoneJavaScriptMEM–{NO+NO}–[NO]–(JS)
NoneNoneNoneJavaMEM–{NO+NO}–[NO]–(JA)
NoneNoneNonePythonMEM–{NO+NO}–[NO]–(PY)
NoneNoneNoneTOPMEM–{NO+NO}–[NO]–(TP)

Zero-shot

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneJavaScriptNoneJavaScriptMEM–{NO+JS}–[NO]–(JS)
NoneJavaNoneJavaMEM–{NO+JA}–[NO]–(JA)
NonePythonNonePythonMEM–{NO+PY}–[NO]–(PY)
NoneTOPNoneJavaScriptMEM–{NO+TP}–[NO]–(JS)
NoneTOPNoneJavaMEM–{NO+TP}–[NO]–(JA)
NoneTOPNonePythonMEM–{NO+TP}–[NO]–(PY)
NoneTOPNoneTOPMEM–{NO+TP}–[NO]–(TP)
NoneALLNoneJavaScriptMEM–{NO+AL}–[NO]–(JS)
NoneALLNoneJavaMEM–{NO+AL}–[NO]–(JA)
NoneALLNonePythonMEM–{NO+AL}–[NO]–(PY)
NoneALLNoneTOPMEM–{NO+AL}–[NO]–(TP)
EnglishNoneNoneJavaScriptMEM–{EN+NO}–[NO]–(JS)
EnglishNoneNoneJavaMEM–{EN+NO}–[NO]–(PY)
EnglishNoneNonePythonMEM–{EN+NO}–[NO]–(PY)
EnglishNoneNoneTOPMEM–{EN+NO}–[NO]–(TP)
EnglishJavaScriptNoneJavaScriptMEM–{EN+JS}–[NO]–(JS)
EnglishJavaNoneJavaMEM–{EN+JA}–[NO]–(JA)
EnglishPythonNonePythonMEM–{EN+PY}–[NO]–(PY)
EnglishTOPNoneJavaScriptMEM–{EN+TP}–[NO]–(JS)
EnglishTOPNoneJavaMEM–{EN+TP}–[NO]–(JA)
EnglishTOPNonePythonMEM–{EN+TP}–[NO]–(PY)
EnglishTOPNoneTOPMEM–{EN+TP}–[NO]–(TP)
EnglishALLNoneJavaScriptMEM–{EN+AL}–[NO]–(JS)
EnglishALLNoneJavaMEM–{EN+AL}–[NO]–(JA)
EnglishALLNonePythonMEM–{EN+AL}–[NO]–(PY)
EnglishALLNoneTOPMEM–{EN+AL}–[NO]–(TP)

No pre-train

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneJavaScriptJavaScriptMEM–{NO+NO}–[JS]–(JS)
NoneNoneJavaJavaMEM–{NO+NO}–[JA]–(JA)
NoneNonePythonPythonMEM–{NO+NO}–[PY]–(PY)
NoneNoneTOPTOPMEM–{NO+NO}–[TP]–(TP)

Lucene v8.6.1

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneNoneJavaScriptLU–(JS)
NoneNoneNoneJavaLU–(JA)
NoneNoneNonePythonLU–(PY)
NoneNoneNoneTOPLU–(TP)

DeepCS

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneJavaScriptJavaScriptDC–[JS]–(JS)
NoneNoneJavaJavaDC–[JA]–(JA)
NoneNonePythonPythonDC–[PY]–(PY)
NoneNoneTOPTOPDC–[TP]–(TP)

Combined models

Fine-tuned models

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneJavaScriptJavaScriptJavaScriptLUMEM–{NO+JS}–[JS]–(JS)
NoneJavaJavaJavaLUMEM–{NO+JA}–[JA]–(JA)
NonePythonPythonPythonLUMEM–{NO+PY}–[PY]–(PY)
NoneTOPJavaScriptJavaScriptLUMEM–{NO+TP}–[JS]–(JS)
NoneTOPJavaJavaLUMEM–{NO+TP}–[JA]–(JA)
NoneTOPPythonPythonLUMEM–{NO+TP}–[PY]–(PY)
NoneTOPTOPTOPLUMEM–{NO+TP}–[TP]–(TP)
NoneALLJavaScriptJavaScriptLUMEM–{NO+AL}–[JS]–(JS)
NoneALLJavaJavaLUMEM–{NO+AL}–[JA]–(JA)
NoneALLPythonPythonLUMEM–{NO+AL}–[PY]–(PY)
NoneALLTOPTOPLUMEM–{NO+AL}–[TP]–(TP)
EnglishNoneJavaScriptJavaScriptLUMEM–{EN+NO}–[JS]–(JS)
EnglishNoneJavaJavaLUMEM–{EN+NO}–[JA]–(PY)
EnglishNonePythonPythonLUMEM–{EN+NO}–[PY]–(PY)
EnglishNoneTOPTOPLUMEM–{EN+NO}–[TP]–(TP)
EnglishJavaScriptJavaScriptJavaScriptLUMEM–{EN+JS}–[JS]–(JS)
EnglishJavaJavaJavaLUMEM–{EN+JA}–[JA]–(JA)
EnglishPythonPythonPythonLUMEM–{EN+PY}–[PY]–(PY)
EnglishTOPJavaScriptJavaScriptLUMEM–{EN+TP}–[JS]–(JS)
EnglishTOPJavaJavaLUMEM–{EN+TP}–[JA]–(JA)
EnglishTOPPythonPythonLUMEM–{EN+TP}–[PY]–(PY)
EnglishTOPTOPTOPLUMEM–{EN+TP}–[TP]–(TP)
EnglishALLJavaScriptJavaScriptLUMEM–{EN+AL}–[JS]–(JS)
EnglishALLJavaJavaLUMEM–{EN+AL}–[JA]–(JA)
EnglishALLPythonPythonLUMEM–{EN+AL}–[PY]–(PY)
EnglishALLTOPTOPLUMEM–{EN+AL}–[TP]–(TP)

Random

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneNoneJavaScriptLUMEM–{NO+NO}–[NO]–(JS)
NoneNoneNoneJavaLUMEM–{NO+NO}–[NO]–(JA)
NoneNoneNonePythonLUMEM–{NO+NO}–[NO]–(PY)
NoneNoneNoneTOPLUMEM–{NO+NO}–[NO]–(TP)

Zero-shot

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneJavaScriptNoneJavaScriptLUMEM–{NO+JS}–[NO]–(JS)
NoneJavaNoneJavaLUMEM–{NO+JA}–[NO]–(JA)
NonePythonNonePythonLUMEM–{NO+PY}–[NO]–(PY)
NoneTOPNoneJavaScriptLUMEM–{NO+TP}–[NO]–(JS)
NoneTOPNoneJavaLUMEM–{NO+TP}–[NO]–(JA)
NoneTOPNonePythonLUMEM–{NO+TP}–[NO]–(PY)
NoneTOPNoneTOPLUMEM–{NO+TP}–[NO]–(TP)
NoneALLNoneJavaScriptLUMEM–{NO+AL}–[NO]–(JS)
NoneALLNoneJavaLUMEM–{NO+AL}–[NO]–(JA)
NoneALLNonePythonLUMEM–{NO+AL}–[NO]–(PY)
NoneALLNoneTOPLUMEM–{NO+AL}–[NO]–(TP)
EnglishNoneNoneJavaScriptLUMEM–{EN+NO}–[NO]–(JS)
EnglishNoneNoneJavaLUMEM–{EN+NO}–[NO]–(PY)
EnglishNoneNonePythonLUMEM–{EN+NO}–[NO]–(PY)
EnglishNoneNoneTOPLUMEM–{EN+NO}–[NO]–(TP)
EnglishJavaScriptNoneJavaScriptLUMEM–{EN+JS}–[NO]–(JS)
EnglishJavaNoneJavaLUMEM–{EN+JA}–[NO]–(JA)
EnglishPythonNonePythonLUMEM–{EN+PY}–[NO]–(PY)
EnglishTOPNoneJavaScriptLUMEM–{EN+TP}–[NO]–(JS)
EnglishTOPNoneJavaLUMEM–{EN+TP}–[NO]–(JA)
EnglishTOPNonePythonLUMEM–{EN+TP}–[NO]–(PY)
EnglishTOPNoneTOPLUMEM–{EN+TP}–[NO]–(TP)
EnglishALLNoneJavaScriptLUMEM–{EN+AL}–[NO]–(JS)
EnglishALLNoneJavaLUMEM–{EN+AL}–[NO]–(JA)
EnglishALLNonePythonLUMEM–{EN+AL}–[NO]–(PY)
EnglishALLNoneTOPLUMEM–{EN+AL}–[NO]–(TP)

No pre-train

Pre-training $E_q$Pre-training $E_c$TrainingTestLabel
NoneNoneJavaScriptJavaScriptLUMEM–{NO+NO}–[JS]–(JS)
NoneNoneJavaJavaLUMEM–{NO+NO}–[JA]–(JA)
NoneNonePythonPythonLUMEM–{NO+NO}–[PY]–(PY)
NoneNoneTOPTOPLUMEM–{NO+NO}–[TP]–(TP)