How to protect the site source code?

Started by Megan Brown, Jul 19, 2022, 08:31 AM

Previous topic - Next topic

Megan BrownTopic starter

I was worried by this problem:
- how to properly protect the site source code  from prying eyes - encryption of the code itself, password for files, etc.
I will forgive your advice, how to do it better and what is the best way to do this?



If you are talking about html, then there is no way.
If about js, then probably nothing either.
If about css - well, most likely again, nothing.
If about php, then it is not transmitted to the browser anyway, the user does not see it.


You can protect the source code of the site pages from being copied. True, these are only superficial measures and do not give a 100% guarantee.
You can write a script in the site code that disables some keys for the user. The F12 key - it calls the developer panel with the HTML code of the site, or the Ctrl + U combination, which opens the source code of the page. There is also an encryption tool, for example, Advanced HTML Encrypt and Password Protect - instead of the code of your page, only incomprehensible lines will be visible. :-X


The most naive technique is fighting with a mouse

Very many authors of "protection systems" first of all begin to reason as follows.

To see the source text of web page, the visitor must select the "View Source" command in the browser. The most typical way to do this is a "pop–up" menu, accessible by right-clicking the mouse. We write a right-click handler in JavaScript, which instead of visualizing the local menu highlights some swear word. There is also the "View/Source" command in the main browser menu. We put the main page inside the frame: then Internet Explorer will show a trivial FRAMESET using this command. And if a smart user sees the address of a page with a frame inside the FRAMESET and opens it directly? And you can cope with this – it's enough to check in JavaScript that we are inside the right frame, and if this is not the case, then immediately go to web page with "swearing".

One of the most obvious mistakes of such authors is that they forget about the browser cache. Usually, the easiest way to get all the source texts "hidden" in this way is by clearing the cache in Internet Explorer and then going to a "protected" web page. After that, it's enough to look into the "Temporary Internet Files" folder and find all the "hidden" HTML and JavaScript files there.

Caching, however, can be fought.

Fighting caching

More "savvy" developers of security mechanisms may remember about page caching and try to disable it. In principle, it's not that difficult. To prohibit caching, there are special elements in the HTTP response header:

  Pragma: no-cache

  Cache-Control: no-cache

(it makes sense to specify both elements). The HTTP response header is quite easy to configure on any server. For those who do not have access to the necessary settings of their server, there are equivalent META tags in the HTML itself:

<meta http-equiv="Pragma" content="no-cache">

<meta http-equiv="Cache-Control" content="no-cache">

The obvious method of "hаcking" such methods of "protection" is the use of a non–standard browser that ignores the instruction "do not cache pages".

It should be borne in mind that if the only task of a hаcker is to get an exact copy of an HTML page or a plug–in JavaScript file on disk, then the "browser" that solves such a task is written in 10 minutes in any language such as Perl or Java.
All you need to do is send a standard HTTP request to the server (for instance, the same one that Internet Explorer issues), get the appropriate response and save it to disk. In most modern programming languages, such a task is solved trivially.

There is also a general solution that allows you to cope with any kind of "caching lock". This is a proxy server that is installed on the client computer and "passes through" all HTTP requests without modifications.
Such a proxy can be completely "invisible" to both the client and the server. "Along the way", the proxy server can save all HTML and JavaScript files passing through it to disk. I will say more: a proxy server with such capabilities has most likely already existed for a long time, for instance, among free utilities to speed up Internet access.

Non-standard processing of requests to files

One day I came across a curious protection scheme based on a non-standard reaction of a web server to a very ordinary (seemingly) file. A complex JavaScript file with a traditional extension was connected to the HTML page.js. But it was not an ordinary text file, but a PHP script that returns JavaScript code to the browser. The web server was configured in such a way that specifically this file with the extension .js was processed by the PHP interpreter.

The PHP script fully analyzed the HTTP request header sent by the browser to the server. Each such header contains, among other things, the "Referer" field: the URL of the document that initiated this request (if any). For Java scripts connected by tag:

<script language="JavaScript" src="..."></script>

The "Referer" field in the request header always contains the address of the HTML page inside which the specified tag is located. If you try to read the same Java script by simply typing its URL in the browser's address bar, then the "Referer" field will be missing.

A PHP script that "pretended" to be ordinary.with a js file, I checked whether the "Referer" really corresponds to the address of web page to which this JavaScript should be connected, or one of such pages. If this condition was met, then PHP returned the correct working JavaScript. Otherwise, another JavaScript returned – also complex, but completely useless.

Of course, caching of this JavaScript file has been disabled.

In principle, such protection is as easy to overcome as a normal caching lock, for instance, using a client proxy server described in the previous paragraph. However, a lazy hаcker hаcking with "improvised means" may well be confused.

Self-generation by the document.writeln() method

From what has been said above, it should be clear that protected pages are best designed in such a way that knowledge of their source texts and the source texts of all plug-in JavaScript files is insufficient.

The main technique in this case is the self-generation of the page by the JavaScript method document. writeln(). An HTML page (possibly including some JavaScript) in this case is "encrypted" in the form of some meaningless–looking string of characters - a string constant of the JavaScript language. Then some JavaScript function is called that decrypts this string, and its result is passed to the document.writeln() method.

On the Internet, you can find special utilities that perform such a conversion with a ready-made HTML page. Some of them compress data at the same time, so that the "meaningless" string turns out to be shorter than the original HTML and JavaScript code.

With a banal implementation, such protection is also easy to crаck. After all, web page already "forcibly" contains the decryption procedure! Nothing prevents you from slightly tweaking the source text of the page and inserting a printout of the argument of this method in any place where it is easy to read before calling document.writeln().

Nevertheless, the idea of encryption is one of the most productive. We will see this in the following sections when we try to create a "serious" protection system. For the time being, we note: it does not follow from nowhere that the first call of the decryption procedure will immediately "bring the fully decrypted source text of web page to the hаcker on a platter".
Maybe only a small part will be decrypted, which in turn contains a call to the decryption procedure for the next fragment. Maybe a web application consists of hundreds of small pages and interconnected scripts, in which it is not so easy to find and "correct" the calls of decrypting procedures.

There is one general problem to keep in mind when using the document. writeln() method. If you select the entire page content in Internet Explorer, copy it to ClipBoard and then try to paste it into some HTML editor, for instance, in FrontPage Editor, then the page content will be inserted already "ready-made". If part of web page (including some JavaScript fragments) was generated by calling document.writeln(), then the HTML editor will get a script containing the call and the HTML code that it created.

Code encryption

We turn to the consideration of more complex technologies for protecting the source code of pages. I will not claim that they provide one hundred percent protection. Maybe it is possible to develop software that guarantees a sufficiently effective automated hаcking of the security mechanisms described below. But in any case, the task of hаcking these mechanisms is not obvious.

About protecting visible HTML code

We'll make a reservation right away. The methods described below do not allow you to protect the final HTML (without JavaScript), which the user eventually sees on protected pages. Visible HTML can always be copied from Internet Explorer via ClipBoard to an HTML editor, as already mentioned in the previous section. It is difficult for me to imagine a situation where HTML, already visible to the user, is so complex that it is extremely difficult to reproduce it from the contents of the screen and, accordingly, it makes sense to protect it from analysis.

Serious protection of already rendered HTML is impossible in principle. This is obvious: you can always "slightly fix" a standard browser, such as Internet Explorer, so that at the time of analyzing the HTML code and playing it on the screen, all this HTML is stored in the protocol on disk. But at the "frivolous" level, some protection is still possible.

Here is an instance of such protection for Internet Explorer, I admit, not too "elegant". You can simply forbid the user to select any fragments of web page, then he will not be able to copy anything to the ClipBoard. One of the ways to do this in Internet Explorer is to execute something like this construction 10 times a second:

var rng= document.body.createTextRange();


Self-decrypting functions

The most obvious way to build protection is based on encryption. We will focus only on protecting JavaScript code: any HTML code can be generated dynamically using JavaScript.

Let's assume that JavaScript code, in accordance with a good programming tone, consists of a large number of not too large functions. We'll start with the following idea.

Let's encrypt the main body of each function and add to its beginning the code that decrypts and executes its body. In JavaScript, this means that the text of the function

function somefunction() {

    some operators;


replaced with the equivalent text:

function somefunction() {

    var encrypted= "...";

      // encrypted text of the same operators

var source=decoding_function (encrypted);



To perform such a replacement, of course, you should write some kind of automatic (or automated) utility.