Result: Quest for True Names

Annual GitHub Data Challenge 2014 Entrant
Project Repository

Questions:

  1. What are typical words used for variable names and function names?
  2. Are they different in different programming languages?
  3. Is there an interesting connection between those words?

Key Findings:

  1. "Get" is the most common word for function names, and "Name" is the most common word for variable names.
  2. Among three major languages we chose, Java is the most uniformed in terms of its word choice. C is the least uniformed, and Python is somewhere in the middle.
  3. Conceptually related words like "key"/"value", "start"/"end", and "width"/"height" appear frequently in the same context.

Data Used: Source codes from the top 100 GitHub repositories in each language.

Comparison between Languages

Which Word is Commonly Used for Names?

Variables
# Java Python C
1.NameSelfName
2.KeyNameData
3.IdArgsSelf
4.ValueDataLen
5.TypeValueSize
6.ViewPathKey
7.RequestKeyInc
8.ContextResultBuf
9.StateKwargsGet
10.ResultResponseType
Types/Classes
# Java Python C
1.TestTestNgx
2.ActivityErrorOmx
3.ViewMetaType
4.ListHandlerAudio
5.FragmentCaseState
6.HandlerCommandInt
7.AdapterFieldInfo
8.FactoryTestsRtmp
9.RequestViewData
10.HttpFileNpy
Functions/Methods
# Java Python C
1.GetGetGet
2.SetEqualSet
3.EqualsTestNgx
4.OnLenString
5.IsSetPy
6.AddAppendMrb
7.ToAddLua
8.StringInitZend
9.CreateErrorInit
10.TestJoinAtl

Top Words in Java

Variable (noun)
# Word Prefix Suffix Bigram
1.NameSavedIdInstance state
2.KeyNewNameSaved instance
3.IdDefaultStateRes id
4.ValueIsTypeMeasure spec
5.TypeMaxViewSerial version
6.ViewRequestCountVersion uid
7.RequestCurrentSizeFile name
8.ContextLastKeyList view
9.StateKeyValueDefault value
10.ResultFileCodeClass name
Type/Class
# Word Prefix Suffix Bigram
1.TestTestTestTest case
2.ActivityAbstractActivityAction bar
3.ViewBaseFragmentStep test
4.ListHttpAdapterView activity
5.FragmentDefaultViewMain activity
6.HandlerSimpleHandlerPull to
7.AdapterImageTestsTo refresh
8.FactoryActionFactoryWeb socket
9.RequestHystrixUtilsList view
10.HttpXmlExceptionHttp client
Function/Method (verb)
# Word Prefix Suffix Bigram
1.GetGetEqualsTo string
2.SetSetStringArray list
3.EqualsIsExceptionOn create
4.OnOnNamePrimitive type
5.IsAddTypeGet string
6.AddTestIdGet name
7.ToToValueNot null
8.StringCreateListBy id
9.CreateReadViewEqual to
10.TestHasTrueGet value
Combination
# Verb+Noun Related Nouns
1.On State.
2.Get Key.
3.Invoke Arg.
4.Create State.
5.Get Class.
6.Get Name.
7.Set Value.
8.Get Id.
9.View View.
10.Type Byte.

Top Words in Python

Variable (noun)
# Word Prefix Suffix Bigram
1.SelfNewNameContent type
2.NameDefaultIdRepo id
3.ArgsIsPathField name
4.DataMaxTypeErr msg
5.ValueNumDataRet val
6.PathFileListComplete apps
7.KeyOldDirFile name
8.ResultStartInfoModule name
9.KwargsLastUrlReturn value
10.ResponseFieldFileSegment info
Type/Class
# Word Prefix Suffix Bigram
1.TestTestTestTest case
2.ErrorPropErrorLang info
3.MetaBaseHandlerNot found
4.HandlerHttpCaseTo many
5.CaseFileTestsMany to
6.CommandGitViewView set
7.FieldAboutCommandField test
8.TestsInvalidSerializerForeign key
9.ViewMyInfoDate time
10.FileMockFieldStream reader
Function/Method (verb)
# Word Prefix Suffix Bigram
1.GetGetEqualValue error
2.EqualTestErrorSet up
3.TestSetTrueAlmost equal
4.LenAddFieldChar field
5.SetIsNameAdd option
6.AppendCreateFileArray equal
7.AddCheckRaisesAdd argument
8.InitParseStringAdd token
9.ErrorMakePathType error
10.JoinToKeyGet object
Combination
# Verb+Noun Related Nouns
1.Raises Error.
2.Equal Code.
3.Get Name.
4.Join Dir.
5.Equal Data.
6.Equal Result.
7.Get None.
8.Init Name.
9.Init Kwargs.
10.Equal Expected.

Top Words in C

Variable (noun)
# Word Prefix Suffix Bigram
1.NameVarNameTsrmls cc
2.DataAsnIdVar from
3.SelfTsrmlsDataTsrmls dc
4.LenGetSizeMrb valuemrb
5.SizeUserLenAttrib list
6.KeyNumPtrVar ui
7.IncMrbTypeUi from
8.BufSslListGet ext
9.GetNewCcRef con
10.TypeWglKeyFrom ui
Type/Class
# Word Prefix Suffix Bigram
1.NgxNgxStateOmx audio
2.OmxOmxTypeNgx rtmp
3.TypeNpyInfoProtocol binary
4.AudioFtCtxNgx http
5.StateMmalClassAudio param
6.IntSeafileRecType def
7.InfoProtocolIntMmal parameter
8.RtmpGlxDataAudio config
9.DataPyDefCls struct
10.NpyKhronosObjectBinary request
Function/Method (verb)
# Word Prefix Suffix Bigram
1.GetNgxMethodNgx http
2.SetMrbStringArg info
3.NgxPyInitNgx rtmp
4.StringZendFreePhp method
5.PyLuaInfoCh st
6.MrbAtlErrorPhp me
7.LuaPhalconNewXt va
8.ZendPhpStZend strl
9.InitGetNameZend hash
10.AtlVimExMrb define
Combination
# Verb+Noun Related Nouns
1.Mrb Mrb.
2.Atl Inc.
3.Cblas Inc.
4.Atl Alpha.
5.Nn Self.
6.Atl Beta.
7.Xt Null.
8.Atl Lda.
9.Mc Sse.
10.Ngx Cf.

Future Work


Yusuke Shinyama (euske)