Improving Semantic Consistency of Variable Names with Use-Flow Graph Analysis

URL: https://euske.github.io/
Yusuke Shinyama
Yoshitaka Arahori
Katsuhiko Gondow
(APSEC 2021 Paper #83)
  1. Background: Consistency is Crucial for Maintaining Large Software Projects
  2. Goal: Detecting Inconsistent Names in Source Code
  3. How to Catch Inconsistency?
  4. "Use flow" of variables
  5. Training and Prediction
  6. Experiment and Evaluation
  7. Discussion
  8. Conclusion

1. Background: Consistency is Crucial for Maintaining Large Software Projects

2. Goal: Detecting Inconsistent Names in Source Code

void printResult(Stream out, String result) {
  out.writeLine("result:"+result);
}

void printStat(Stream out, int stat) {
  out.writeLine("stat:"+stat);
}

void printInfo(Stream strm, String info) {
  strm.writeLine("info:"+info);  // XXX "strm" should be "out".
}

Note: "out" might not necessarily be the best name for an output stream, but it is consistent throughout the program.

We also tried to:

  1. Make an adjustable system:
    Each project uses the same name for different things: Or different abbreviations:
  2. Not use a dictionary or heuristics.
  3. Make the system transparent:

3. How to Catch Inconsistency?

  1. Construct a mapping between the usage and the name for each variable:
  2. Find all the variables that have the same usage:
  3. Compare the variable names and single out the outliers.

4. "Use Flow" of Variables

4.1. Example of use flows

Take a look at the variable line:

private BufferedReader fp;

public String get() {
    String line = fp.readLine();
    int i = line.indexOf(' ');
    return line.substring(0, i);
}

Here is its data flow visualization:

get() cluster_a N1 fp N2 readLine() N1->N2 #this N4 indexOf() N2->N4 #this N6 substring() N2->N6 #this N3 ' ' N3->N4 #arg0 N4->N6 #arg1 N5 0 N5->N6 #arg0 N7 return N6->N7

The red lines above = a use flow of line:

This path shows what is assigned to the variable line and how it is used.

4.2. Make It Interprocedural

private BufferedReader fp;

public String get() {
    String line = fp.readLine();
    int i = line.indexOf(' ');
    return line.substring(0, i);
}

public void show() {
    String name = get();
    System.out.println(name+"!!");
}

Note that line in function get() is now name in function show().

Here's the data flow graph:

A cluster_a get() cluster_b show() N1 fp N2 readLine() N1->N2 #this N4 indexOf() N2->N4 #this N6 substring() N2->N6 #this N3 ' ' N3->N4 #arg0 N4->N6 #arg1 N5 0 N5->N6 #arg0 N9 name N6->N9 N11 + N9->N11 L N8 System.out N12 println() N8->N12 #this N10 !! N10->N11 R N11->N12 #arg0

The final use flow of line:

This path represents the usage of the variable line in this program.

5. Training and Prediction

Trained a Bayesian probabilistic model that predicts a variable name from a given use flow:

fp.readline() line #this:indexOf() #arg1:substring() assign:name

The following use flow predicts ??? = line:

If the variable name is other than line, it is inconsistent with this usage.

Algorithm:

6. Experiment and Evaluation

Evaluated our method with the following projects: (#edges < #useflow)

ProjectkLoC#vars #nodes#edges
ant (build tool)112k23,971 350k5,211k
antlr4 (parser generator)31k7,131 74k1,103k
bcel (byte code analyzer)31k6,583 80k1,190k
compress (data compression)24k5,896 69k929k
jedit (text editor)115k21,977 294k6,106k
jhotdraw (diagram renderer)80k17,367 235k2,351k
junit (unit testing)9k2,384 21k280k
lucene (document indexing)109k30,341 414k7,146k
tomcat (web server)238k49,275 649k11,799k
weka (machine learning)324k59,274 943k13,224k
xerces (XML parser)114k21,852 314k7,017k
xz (data compression)7k1,825 23k299k

Test subjects: authors (3) + grad students (6) = 9 people.

RQ1. Is Use Flow a Good Representation for Variable Usage?

Experiment 1. Variable Equivalence Test

Present a pair of variables (whose names hidden) to human subjects:

Pair R003
Choice:
DefaultJspCompilerAdapter.java
     ...

  100:     */
  101:    protected void addArg(CommandlineJava aa, String argument, String value) {
  102:        if (value != null) {
  103:            aa.createArgument().setValue(argument);
  104:            aa.createArgument().setValue(value);
  105:        }
     ...

DefaultJspCompilerAdapter.java
     ...

   87:     */
   88:    protected void addArg(CommandlineJava bb, String argument) {
   89:        if (argument != null && !argument.isEmpty()) {
   90:           bb.createArgument().setValue(argument);
   91:        }
     ...

Tested for 12 projects × 5 variable pairs × 9 subjects = 540 questionnaires.

antantlr4bcelcompressjeditjhotdrawjunitlucenetomcatwekaxercesxzAvg.MustBeSameCanBeSameDifferentUnknown

Defects: Only tested with the high similarity pairs. Therefore the test was not completely blind. :(

RQ2. Can the System Predict a Good (Consistent) Variable Name?

Experiment 2-1. Name Suggestion Test

Present a snippet to human subjects:

Rewrite R000 (11.118)
Choice: xxx
BuildException.java
     ...

   82:     */
   83:    public BuildException(String xxx, Throwable cause, Location location) {
   84:        this(xxx, cause);
   85:        this.location = location;
     ...

Evidence
BuildException.java
     ...

   67:     */
   68:    public BuildException(String message, Throwable cause) {
   69:        super(message, cause);
   70:    }
     ...

Their choices are (in random order):

  1. Original name (Orig)
  2. Our system suggestion (Ours)
  3. Baseline suggestion (Baseline)

Tested for 12 projects × 10 variables × 9 subjects = 1,080 questionnaires.

antantlr4bcelcompressjeditjhotdrawjunitlucenetomcatwekaxercesxzAvg.OursOrig+Baseline

Experiment 2-2. Sending Patches to Developers

Based on our results, we manually submitted 12 patches to the open source projects.

RQ3. Is the System Output Explainable?

Experiment 3. Evidence Persuasiveness Test

  1. Choose 5 variable name suggestions which was highly ranked.
  2. Present the evidences (snippets) used for producing each suggestion.
  3. Ask the subjects to choose one of the following:
    1. Presented evidence is convincing. (#Good)
    2. Presented evidence is relevant. (#Soso)
    3. Presented evidence is irrelevant. (#Bad)
    4. Undecidable. (#Unknown)

Tested for 12 projects × 5 questions × 9 subjects = 540 questionnaires.

antantlr4bcelcompressjeditjhotdrawjunitlucenetomcatwekaxercesxzAvg.GoodSosoBadUnknown

The results did not know that our system produced good explanation for its suggestions. :(

Anecdotal Examples

Some of the system suggestions were good:

7. Discussion

  1. Use flows are somewhat good at representing the variable usage.
  2. In real projects, our system suggested a better (more consistent) name than the original with 39% probability.
  3. We are not sure that our system produced a good explaination for its output.

7.1. Threats to Validity

Internal Validity (Did we answer RQs?)

  1. The subjects might have a prior knowledge about the projects used.
  2. The results depend on each subject's proficiency of the language.
  3. It is not clear how many variables the proposed method can apply to.
  4. Not every use flow is correctly obtained.

External Validity (Is our result generalizable?)

  1. The programming language is limited to Java.
  2. Not enough projects are tested with enough subjects.
  3. Dynamic dispatching or variable aliasing is not considered.
  4. Naive Bayes classifier could be improved.

8. Conclusion


Yusuke Shinyama