Changeset - 3d4bf5159b37
[Not reviewed]
0 1 0
Bradley Kuhn (bkuhn) - 8 years ago 2016-08-09 04:55:58
bkuhn@ebb.org
Update with various changes.
1 file changed with 15 insertions and 15 deletions:
0 comments (0 inline, 0 general)
www/conservancy/static/copyleft-compliance/vmware-code-similarity.html
Show inline comments
...
 
@@ -5,12 +5,5 @@
 

	
 
<h1>Similarity Analysis and Contribution Analysis of Christoph Hellwig's
 
  Linux Code as found in VMware ESXi 5.5</h1>
 

	
 
<p>This analysis verifies by reproducible analysis a set of specific
 
contributions that are clearly made by Christoph Hellwig to Linux, and shows
 
  how those contributions appear in the VMware ESXi 5.5 product.</p>
 

	
 
<p>This analysis was prepared and written by Bradley M. Kuhn.</p>
 

	
 
  
 
<h1 id="similarity-analysis-and-contribution-analysis-of-christoph-hellwigs-linux-code-as-found-in-vmware-esxi-5.5">Similarity Analysis and Contribution Analysis of Christoph Hellwig's Linux Code as found in VMware ESXi 5.5</h1>
 
<p>This analysis verifies by reproducible analysis a set of specific contributions that are clearly made by Christoph Hellwig to Linux, and shows how those contributions appear in the VMware ESXi 5.5 product.</p>
 
<p>This analysis was prepared and written by <a href="https://sfconservancy.org/about/staff/#bkuhn">Bradley M. Kuhn</a>.</p>
 
<h1 id="understanding-code-similarity-and-cloning">Understanding Code Similarity and &quot;Cloning&quot;</h1>
...
 
@@ -19,3 +12,3 @@ contributions that are clearly made by Christoph Hellwig to Linux, and shows
 
<h1 id="establishing-a-baseline-of-the-ccfinderx-tool">Establishing A Baseline of the CCFinderX Tool</h1>
 
<p>CCFinderX offers many statistics for clone detection. After expert analysis, we concluded that most relevant to this case is the &quot;ratio of similarity&quot; between the existing code and the new code. To establish a baseline, we considered two different comparisons of Free and Open Source Software (FOSS). First, we compared the Linux kernel, Version 4.5.2, to the FreeBSD kernel, Version 10.3.0. This comparison was inspired by the similar 2002 study <a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a> of these two large C programs. The hypothesis remained that CCFinderX would encounter a small percentage of code similarity, since the FreeBSD and Linux projects collaborate on some subprojects and willingly share code under the 3-Clause BSD license for those parts. (These collaborations are public and well-documented.)</p>
 
<p>CCFinderX offers many statistics for clone detection. After expert analysis, we concluded that most relevant to this situation is the &quot;ratio of similarity&quot; between the existing code and the new code. To establish a baseline, we considered two different comparisons of Free and Open Source Software (FOSS). First, we compared the Linux kernel, Version 4.5.2, to the FreeBSD kernel, Version 10.3.0. This comparison was inspired by the similar 2002 study <a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a> of these two large C programs. The hypothesis remained that CCFinderX would encounter a small percentage of code similarity, since the FreeBSD and Linux projects collaborate on some subprojects and willingly share code under the 3-Clause BSD license for those parts. (These collaborations are public and well-documented.)</p>
 
<p>The experiment confirmed the hypothesis. We found that a 3.68% &quot;ratio of similarity&quot; when comparing code from Linux to the FreeBSD kernel.</p>
...
 
@@ -24,3 +17,3 @@ contributions that are clearly made by Christoph Hellwig to Linux, and shows
 
<h1 id="general-comparison-of-linux-kernel-to-vmware-sources">General Comparison of Linux Kernel to VMware sources</h1>
 
<p>With the baseline established, we now begin comparisons relevant to this case. First, we compare the Linux kernel version 2.6.34 to the sources released by VMware in their (partial) source release. the &quot;ratio of similarity&quot; between Linux 2.6.34 and VMware's partial source release is 20.72%. There is little question that much of VMware's kernel has come from Linux.</p>
 
<p>With the baseline established, we now begin relevant comparisons. First, we compare the Linux kernel version 2.6.34 to the sources released by VMware in their (partial) source release. the &quot;ratio of similarity&quot; between Linux 2.6.34 and VMware's partial source release is 20.72%. There is little question that much of VMware's kernel has come from Linux.</p>
 
<h1 id="methodology-of-showing-hellwigs-contributions-in-vmware-esxi-5.5-sources">Methodology Of Showing Hellwig's Contributions in VMware ESXi 5.5 Sources</h1>
...
 
@@ -33,5 +26,12 @@ contributions that are clearly made by Christoph Hellwig to Linux, and shows
 
<p>After finding these separate occasions of contribution, I then extracted the source code lines that Hellwig added or changed in each contribution in this repository. I did so by carefully cross-referencing the commits that Hellwig performed with the output of <code>git blame</code>. I specifically <a href="https://github.com/conservancy/gpl-compliance-tools/blob/master/extract-code-added-in-commits.plx">wrote a script</a> to carefully extracted only lines that Hellwig changed or added in that repository, and placed only those contributions identifiable as Hellwig's into new files whose named matched the original filenames. This created a corpus of code that can be verifiable as added or changed by Hellwig and no one else.</p>
 
<p>Here are the specific commands I ran: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git linux-historical $ ./commit-id-list-matching-regex.plx <code>pwd</code>/linux-historical/.git Hellwig '(Submitted+by|originals+patch|patch+from|originally+by).<em>' &gt; hellwig-historical.ids $ ./extract-code-added-in-commits.plx --repository=<code>pwd</code>/linux-historical --output-dir=<code>pwd</code>/hellwig-historical --central-commit e7e173af42dbf37b1d946f9ee00219cb3b2bea6a --progress --blame-opts=-M --blame-opts=-C &lt; ./hellwig-historical.ids $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-current $ ./commit-id-list-matching-regex.plx <code>pwd</code>/linux-current/.git Hellwig '(Submitted+by|original+patch|patch+(from|by)|originally+(from|by)).</em>' &gt; ./hellwig-current.ids $ ./extract-code-added-in-commits.plx --progress --repository=<code>pwd</code>/linux-current --output-dir=<code>pwd</code>/hellwig-through-2.6.34 --fork-limit=14 --blame-opts=-M --blame-opts=-M --blame-opts=-C --blame-opts=-C --central-commit e40152ee1e1c7a63f4777791863215e3faa37a86 &lt; hellwig-current.ids</p>
 
<p>Note: e40152ee1e1c7a63f4777791863215e3faa37a86 is the 2.6.34 version created by Linus Torvalds <script type="text/javascript">
 
<!--
 
h='&#108;&#x69;&#110;&#x75;&#120;&#x2d;&#102;&#x6f;&#x75;&#110;&#100;&#x61;&#116;&#x69;&#x6f;&#110;&#46;&#x6f;&#114;&#x67;';a='&#64;';n='&#116;&#x6f;&#114;&#118;&#x61;&#108;&#100;&#x73;';e=n+a+h;
 
document.write('<a h'+'ref'+'="ma'+'ilto'+':'+e+'">'+e+'<\/'+'a'+'>');
 
// -->
 
</script><noscript>&#116;&#x6f;&#114;&#118;&#x61;&#108;&#100;&#x73;&#32;&#x61;&#116;&#32;&#108;&#x69;&#110;&#x75;&#120;&#x2d;&#102;&#x6f;&#x75;&#110;&#100;&#x61;&#116;&#x69;&#x6f;&#110;&#32;&#100;&#x6f;&#116;&#32;&#x6f;&#114;&#x67;</noscript> on 2010-05-16 14:17:36 -0700, with Git commit comment: &quot;Linus 2.6.34&quot;.</p>
 
<h2 id="comparing-hellwigs-contributions-from-linux-historical-repository-to-vmware-sources">Comparing Hellwig's Contributions From Linux Historical Repository to VMware Sources</h2>
 
<p>I then used this corpus as input to CCFinderX (similar to the other CCFinderX comparisons explained earlier). Specifically, this CCFinderX comparison compared all known Hellwig-contributed material from the historical Linux repository to the partial VMware source release. CCFinderX found a ratio of similarity of 0.0900% between Hellwig's code and the source code in VMware's (partial) source release. CCFinderX specifically identified 12 distinct locations where substantial sections of code contributed by Hellwig appeared in VMware's code.</p>
 
<p>Most notably, substantial portions of the two of the core SCSI functions previously identified were also found by this search technique: <code>__scsi_device_lookup</code> and <code>__scsi_get_command</code>. Also showing substantial similarities in this search were the following functions: <code>mpt_get_product_name</code>, <code>scsi_proc_host_rm</code>, <code>mega_enum_raid_scsi</code>, <code>mega_m_to_n</code>, <code>mega_prepare_passthru</code>, <code>proc_scsi_show</code>, and <code>__down_read_trylock</code>.</p>
 
<p>Most notably, substantial portions of the the following core SCSI functions were found by this search technique: <code>__scsi_device_lookup</code> and <code>__scsi_get_command</code>, <code>mpt_get_product_name</code>, <code>scsi_proc_host_rm</code>, <code>mega_enum_raid_scsi</code>, <code>mega_m_to_n</code>, <code>mega_prepare_passthru</code>, <code>proc_scsi_show</code>, and <code>__down_read_trylock</code>.</p>
 
<h2 id="extracting-hellwigs-contributions-from-modern-linux-repository">Extracting Hellwig's Contributions From Modern Linux Repository</h2>
...
 
@@ -51,4 +51,4 @@ contributions that are clearly made by Christoph Hellwig to Linux, and shows
 
<p>However, we can consider this process above to have found a bare minimum of Hellwig's contributions that appear in VMware's partial source release.</p>
 
<h1 id="further-analysis-of-examples-from-previous-filing-in-this-case">Further Analysis of Examples from Previous Filing in This Case</h1>
 
<p>In the previous filings in this case, we discussed eight critical C functions, written by Hellwig, that have near-equivalents in VMware's ESXi 5.5 product.</p>
 
<h1 id="further-analysis-of-additional-examples">Further Analysis of Additional Examples</h1>
 
<p>Separately from the analysis above, Hellwig identified a specific list of eight critical C functions to which he specifically recalls contributing, and near-equivalents were found in in VMware's ESXi 5.5 product.</p>
 
<p>In this additional analysis, we used CCFinderX in a different way <a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>. In these tests, I confine the code tests to specific small sections of code that were previously identified by human analysis as similar. In this way, I used CCFinderX to confirm with computational analysis what was already obvious to the human eye.</p>
0 comments (0 inline, 0 general)