Changeset - 59f08ae812a4
[Not reviewed]
0 1 0
Bradley Kuhn (bkuhn) - 8 years ago 2016-08-09 22:16:44
bkuhn@ebb.org
Various fixes from regenerated from Markdown source
1 file changed with 3 insertions and 3 deletions:
0 comments (0 inline, 0 general)
www/conservancy/static/copyleft-compliance/vmware-code-similarity.html
Show inline comments
...
 
@@ -3,7 +3,7 @@
 
{% block submenuselection %}VMwareCodeSimilarity{% endblock %}
 
{% block content %}
 

	
 
<h1 id="similarity-analysis-and-contribution-analysis-of-christoph-hellwigs-linux-code-as-found-in-vmware-esxi-5.5">Contribution and Similarity Analysis of Christoph Hellwig's Linux Code as found in VMware ESXi 5.5</h1>
 
<h1 id="contribution-and-similarity-analysis-of-christoph-hellwigs-linux-code-as-found-in-vmware-esxi-5.5">Contribution and Similarity Analysis of Christoph Hellwig's Linux Code as found in VMware ESXi 5.5</h1>
 
<p>This analysis verifies by reproducible analysis a set of specific contributions that are clearly made by Christoph Hellwig to Linux, and shows how those contributions appear in the VMware ESXi 5.5 product.</p>
 
<p>This analysis was prepared and written by <a href="https://sfconservancy.org/about/staff/#bkuhn">Bradley M. Kuhn</a>.</p>
 
<h1 id="understanding-code-similarity-and-cloning">Understanding Code Similarity and &quot;Cloning&quot;</h1>
...
 
@@ -12,7 +12,7 @@
 
<h1 id="establishing-a-baseline-of-the-ccfinderx-tool">Establishing A Baseline of the CCFinderX Tool</h1>
 
<p>CCFinderX offers many statistics for clone detection. After expert analysis, we concluded that most relevant to this situation is the &quot;ratio of similarity&quot; between the existing code and the new code. To establish a baseline, we considered two different comparisons of Free and Open Source Software (FOSS). First, we compared the Linux kernel, Version 4.5.2, to the FreeBSD kernel, Version 10.3.0. This comparison was inspired by the similar 2002 study <a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a> of these two large C programs. The hypothesis remained that CCFinderX would encounter a low but significant percentage of code similarity, since the FreeBSD and Linux projects collaborate on some subprojects and willingly share code under the 3-Clause BSD license for those parts. (These collaborations are public and well-documented.)</p>
 
<p>The experiment confirmed the hypothesis. We found that a 3.68% &quot;ratio of similarity&quot; when comparing code from Linux to the FreeBSD kernel.</p>
 
<p>Next, we compared the source code of the Linux Kernel 4.5.2 to the LLVM+Clang system, version 3.8.0. These two projects are each a large program written in the C programming language, but they are not known to actively share code. We would expect some very minimal similarity simply due to chance, but something much lower than the 3.68% found between Linux and FreeBSD's kernel.</p>
 
<p>Next, we compared the source code of the Linux Kernel 4.5.2 to the LLVM+Clang system, version 3.8.0. These two projects are each a large program that are not known to actively share code. There may be some very minimal similarity simply due to chance, but something much lower than the 3.68% found between Linux and FreeBSD's kernel.</p>
 
<p>Indeed, when the same test is run to compare Linux to the LLVM+Clang system, the &quot;ratio of similarity&quot; was 0.075%.</p>
 
<h1 id="general-comparison-of-linux-kernel-to-vmware-sources">General Comparison of Linux Kernel to VMware sources</h1>
 
<p>With the baseline established, we now begin relevant comparisons. First, we compare the Linux kernel version 2.6.34 to the sources <a href="http://k.sfconservancy.org/vmkdrivers">released by VMware in their (partial) source release</a>. The &quot;ratio of similarity&quot; between Linux 2.6.34 and VMware's partial source release is 20.72%. There is little question that much of VMware's kernel has come from Linux.</p>
...
 
@@ -48,7 +48,7 @@ document.write('<a h'+'ref'+'="ma'+'ilto'+':'+e+'">'+e+'<\/'+'a'+'>');
 
<p>As before, after finding these separate occasions of contribution, I then extracted the source code lines that Hellwig added or changed in each contribution in this repository. I did so by carefully cross-referencing the commits that Hellwig performed with the output of <code>git blame</code>. I specifically <a href="https://github.com/conservancy/gpl-compliance-tools/blob/master/extract-code-added-in-commits.plx">used the same script as before</a> to carefully extracted only lines that Hellwig changed or added in that repository, and placed only those contributions identifiable as Hellwig's into new files whose named matched the original filenames. This created a corpus of code that can be verifiable as added or changed by Hellwig and no one else.</p>
 
<h2 id="comparing-hellwigs-contributions-from-modern-linux-repository-to-vmware-sources">Comparing Hellwig's Contributions From Modern Linux Repository to VMware Sources</h2>
 
<p>I then used this corpus as input to CCFinderX again. Specifically, this CCFinderX comparison compared all known Hellwig-contributed material from the modern Linux repository to the partial VMware source release. CCFinderX found a ratio of similarity of 0.1615% between Hellwig's code and the source code in VMware's (partial) source release was contributed by Hellwig. CCFinderX specifically identified 23 distinct locations where substantial sections of code contributed by Hellwig appeared. These 23 locations are found in the following 19 functions: <code>mptsas_init</code>, <code>mptsas_get_linkerrors</code>, <code>megasas_build_and_issue_cmd</code>, <code>cciss_getgeo</code>, <code>mptsas_get_bay_identifier</code>, <code>phy_to_ioc</code>, <code>mptsas_sas_enclosure_pg0</code>, <code>SendIocInit</code>, <code>mptsas_parse_device_info</code>, <code>csmisas_sas_device_pg0</code>, <code>mptsas_sas_io_unit_pg0</code>, <code>mptsas_sas_io_unit_pg1</code>, <code>mptsas_sas_expander_pg1</code>, <code>mptsas_sas_enclosure_pg0</code>, <code>aac_handle_aif</code>, <code>mptsas_get_bay_identifier</code>, <code>mpt_host_page_alloc</code>, <code>mptsas_probe_one_phy</code>.</p>
 
<h2 id="changed-and-added-lines-create-an-impartial-picture">Changed And Added Lines Create an Incomplete Picture</h2>
 
<h2 id="changed-and-added-lines-create-an-incomplete-picture">Changed And Added Lines Create an Incomplete Picture</h2>
 
<p>In <a href="https://www.linuxfoundation.org/sites/main/files/publications/estimatinglinux.html"><em>Estimating the Total Cost of a Linux Distribution</em></a>, McPherson, Proffitt, and Hale-Evans write:</p>
 
<blockquote>
 
<p>Anyone who is familiar with kernel development, for instance, realizes that the highest man-power cost in its development is when code is deleted and modified. The amount of effort that goes into deleting and changing code, not just adding to it, is not reflected in the values associated with this estimate. Because in a collaborative development model, code is developed and then changed and deleted, the true value is far greater than the existing code base. Just think about the process: when a few lines of code are added to the kernel, for instance, many more have to be modified to be compatible with that change. The work that goes into understanding the dependencies and outcomes and then changing that code is not well represented in this study.</p>
0 comments (0 inline, 0 general)