Extraction of
High-level Semantics
from Software Code
Yusuke Shinyama
Tokyo Tech, Japan

Introduction

010110011110010... "Shows a picture of kitty."
public int add
  ArrayList<St
  int nWidth =
  while (st <
    int height
    if (st * r
      img.chec
    } else if 
...

Contents

  1. Backgrounds
  2. Problem Description
  3. Methodology
  4. Conclusion

Disclaimer: Not much has been done.

Backgrounds

Why is Software So Important?

Software controls...

  • Your phone
  • Your payroll
  • Cars, trains and airplanes
  • Electricity and nuclear plants
  • Everything!

How many had software glitch in the past month?

Software = Too much $$$

Software = Life or Death

Software = So Subtle

Apple's "GotoFail" (2014)

static OSStatus
SSLVerifySignedServerKeyExchange(...)
{
    OSStatus        err;
    ...
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; ... fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }
Problem Description

Problems

Attacking the Monster

Research Question

a = 100;
b = 2;
c = a + b;
d = a * c;
...

Example 1

if (c == 1) {
    b = b - 100;
    a1 = a1 + 1;
} else if (c == 2) {
    b = b - 300;
    a2 = a2 + 1;
} else
...

Example 1

if (choice == 1) {
    money = money - 100;
    item1 = item1 + 1;
} else if (choice == 2) {
    money = money - 300;
    item2 = item2 + 1;
} else
...

Example 2

if (x) {
    y = y - 1;
}
if (y == 0) {
    g();
}
...

Example 2

if (hit) {
    life = life - 1;
}
if (life == 0) {
    gameover();
}
...
Methodology

Basic Idea

Pattern Matching

Current Progress

  1. Download top 1,000 GitHub repos.
    (11GB zipped text)
  2. Parse .java codes. [Eclipse JDT]
    (480,627 files. 74,088,883 loc.)
  3. Store dataflow graphs.
    (4,199,301 functions, 42,133,919 nodes)
  4. Implement pattern matching.
Wrap-up

Conclusions

References

  1. This Car Runs on Code, IEEE Specrum, 2009
  2. Koopman, "A Case Study of Toyota Unintended Acceleration and Software Safety", 2014
  3. Glass, Facts and Fallacies of Software Engineering, Addison-Wesley, 2002
  4. Eclipse Java development tools (JDT)