Ukrainian Law Blog: A Look Inside the Forensic Analysis of Software Copyright Infringement

Bob Zeidman, Zeidman Consultng, Legaltech News

In order to bring copyright litigation, a work must be registered with the U.S. Copyright Office. For most written works, like novels, authors can simply obtain the version of the novel that is on file with the U.S. Copyright Office and compare it to the allegedly infringing copy to determine what they have in common. These common sections may constitute copyright infringement if they are not covered by fair use.

With software, this simple process doesn't work. Software code contains both trade secrets in the functionality and copyrightable expression in the way it is written. Registering the software to protect the expression would expose the functionality and destroy all trade secrets that must, by definition, be kept secret.

The U.S. Copyright Office has a solution to this problem that is unique, and unusual. When a program contains trade secret or other confidential material, the copy filed with the copyright office may consist of the first 10 and last 10 pages of the source code; or the first 25 and last 25 pages of object code with a 10-page consecutive segment of source code from any part of the program; or the first 25 and last 25 pages of source code with the portions containing trade secrets or confidential material blocked out.

What is particularly strange about this accommodation, though, is that the purpose of copyright law is to give protection to authors to encourage them to release their works so that the ideas expressed within can be disseminated, understood, and improved upon. With software, you can submit 50 pages of almost entirely redacted code (i.e., nearly blank pages) and still register the copyright.

There is, however, a downside. In order to litigate against an infringer, the copyright must be registered with the copyright office. It can be registered any time before the litigation is filed, and I've seen copyrights registered minutes before.

Therefore, to perform a forensic analysis of software in a copyright infringement case, these steps should be performed:

First, the allegedly infringed code should be compared against the code on file with the copyright office. The un-redacted code on file should exactly match code in the program code that was registered. In that case, the expert can confirm "to the best of his or her ability" that the code that the plaintiff claims was registered is the code that was actually registered. The less code that was submitted to the copyright office, the less confident the expert can be, which leaves the defendant with an argument that the appropriate code was never registered and thus ineligible for a lawsuit.

Even worse though is if the plaintiff cannot produce the exact version of the code that was registered. In that case, this first comparison may show that the code produced by the plaintiff is definitely different from that registered with the copyright office, leaving serous doubt as to whether the allegedly infringed code is covered by the registration.

The next step is to compare all of the code from the registered program with all of the code from the allegedly infringed program. Remember, while the code submitted to the copyright office is only a fraction of the entire program, it is the entire program code that is considered to be registered. This comparison must show that there are not substantial differences, so that the allegedly infringed code is covered by the registration. What is meant by "substantially the same" is open to interpretation. Spelling corrections and changes to lines here and there throughout the program are generally not considered substantial. Adding a new routine would be considered substantial, and the new routine would not be covered by the previous registration.

The third step is to compare the allegedly infringed code to the allegedly infringing code to find similarities. Any "substantial" correlations that cannot be explained by reasons other than copying are indications of copyright infringement. The six reasons for correlation include common author, common algorithms, commonly used identifier names, third-party code, automatically generated code, and copying. The meaning of "substantial" in this context is open to interpretation but is definitely not related to percentages of matching lines. A small percentage of copied code can be very useful or creative and thus be considered substantial for showing infringement.

Only if these steps are performed can an accurate and correct conclusion be drawn in a copyright litigation.

Original