One of my readers wrote me to assist him in writing a code that mimic a function of a computer science virus. In fact, he was working on an exciting research project and he was in need to make a demonstration. The idea is simple, the virus (in our case, a python script) must eat a python module file and for each functions inside the module, replace the each line of the body by another one line task. In others words, the script takes as input a python file, and go through each line to locate a block of function and inside this function, replaces each task by another task. For example, our single line action would be: print(“Hey, I'm a Virus that changes your function body :-)”)
As he was in hurry, he wanted something that can work for a classic and basic well written python module. So, we suppose that our python module works well and starts with a block of import statements, then follow only some functions. There is no class inside the module. The module cannot be run directly; it can only be imported in another program. That means there is no if statement that test the built-in __name__ variable (if __name__ == ‘__main__’). Inside the module body, we may have comments anywhere starting with # character. Also, we may encounter documentation string starting and ending by triple quotes, anywhere in the module. Last case is the fact that we can have a comment following the definition of a function (aka inline comment). Based on these basics constraints, i wrote a small script that did the job. In this article, I will go through the process and the code.
To make my script more flexible, I introduced an action to request the filename of our source module to corrupt. Our script starts by the main function. Then each line is analyzed to see if it is the beginning of a function. The analysis is done inside another function called isNew_function_declaration_line.
Clearly, we call the function isNew_function_declaration_line inside the main function and apply it on each line of our source module. This function takes a string variable as a parameter. Let’s have a look at:
A block of function starts with the def keyword then we have one space, then we have a left parenthesis then follow some parameters separated by comma then we have a right parenthesis and finally a colon. This is the classic first line.
Another case, we may have what we call inline comment after the colon. That means, after the colon, we have a space then follow a pound symbol. We may also have documentation string with single or double quotes after the function declaration line.
So, with these above patterns we can check if a given line is the starting point of a function or not. Our function will return a Boolean value (true in case it is a valid function declaration and false if not).
The import statement was written on one line : from os import system, path, _exit
Now that our core function is ready, let’s have a look at the main function. The main function will use two global variables which are the source module filename and the new action to insert inside the body of each function. Well, here is what we’re going to do:
We will go through each line of the source module file and pass that line to our isNew_function_declaration_line function then based on the return value (true or false) some actions will be taken. For example, if isNew_function_declaration_line return true, we will flag a local variable to say that we are now going inside a function body. So at the next iteration (next line to analyze), the actions will consist of replacing the content of the function by our unwanted action [print(“Hey, I'm a Virus that changes your function body :-)”)]. Of course, we will not modify comments lines. In reality, we create a new python file inside the same folder of the source module, and for each line of the source module processed, the result (action taken) will be put inside this new file. At the end, we will remove the source module and rename the new file by the name of the source module file. To do these two actions in one line, we are going to use the built-in MS-DOS command: MOVE
As I said above, this is a minimal script that gives a simple idea to do the job. It requires lots of improvements in other to deal with large and complete python module which usually contains many classes and maybe some nested classes. It requires a better way to handle exceptions. You can make it corrupt a list of files inside a given folder. Even, you can write a small beautiful class that does all this job. The idea is simple, the constructor of the class should receive two parameters and will set them as class level attributes. Our two functions will be the two methods of the class. Before you create an instance of this class, you will need to prompt the user to enter these two parameters.
Personally, I added a constraint which says that documentation string may also be inside a function body. I mentioned it above by saying, documentation string could be anywhere in the module. Below is my approach to deal with that constraint:
Create a local Boolean variable (function level variable) which will be use as a flag. Let’s call it inside_docstring_block. It will be true once we encounter for the first time a triple quote and it will switch to false as soon as we encounter the next triple quote. All lines between these two sequences will not be modified by our script. There is something to precise, as we proceed based on each line; we may miss the case where we have a documentation string written in one line. To take this into account, we can basically check if we have triple quote at the beginning and at the end. Both the two styles of documentation string delimiter are considered.
Finally, I added a second parameter to isNew_function_declaration_line function. Since, it is a study script, i found interesting to track each line of the source module file. While the script is running we can see at the screen the decision taken for each line.
Below is the entry block of our final script.
Thanks & Enjoy!