Python Programming – File Processing

You can learn about Python File Processing Programs with Outputs helped you to understand the language better.

Python Programming – File Processing

The file is a named location on the system storage which records data for later access. It is used to store data permanently for future use to perform various file operations like opening, reading, writing, etc. Usually, a file is kept on a permanent storage media, e.g. a hard drive disk. A unique name and path are used by users to access a file for reading, writing, and modification purposes. The most basic tasks involved in file manipulation are reading data from files and writing or appending data to files.
Python supports file handling and allows users to read, write and append files, along with many other file handling options that operate on files. In Python, file processing takes place in the following order:

Open a file that returns a file handle.
Use the handle to perform read or write action.
Close the file handle.

Before you do a read or write operation to a file in Python, you need to open it first. And as the read/write transaction completes, you should close it to free the resources tied with the file.

The file is a named location on disk to store related information.

A file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be simple as a text file or as complicated as a program executable.

Types of File

There are two types of files in Python. They are:

Text file
Binary file

A text file is simply a file that stores sequences of characters using an encoding like utf-8, latinl etc., whereas, in the case of a binary file, data is stored in the same format as in Computer memory. Examples of text files are Python source code, HTML file, etc. Whereas executable files, images, audio, etc. are examples of binary files. Inside the disk, both types of files are stored as a sequence of Os and Is.

So at the lowest level, the text file will be a collection of bytes. The only difference is that when a text file is opened the data is decoded back using the same encoding scheme they were encoded in. However, in the case of binary files, no such thing happens.
A text file is the most common file. Each line of the text file is terminated by a special character, known as End of Line (EOL). In Python, the new line character (\ n) is the default EOL terminator.

Binary files store data in Os and Are i.e. machine-readable format. One character stores one byte in the memory i.e., 8-bits.
Since binary files store data after converting it into the binary language (Os and Is), there is no EOL character. This file type returns bytes. This is the file to be used when dealing with non-text files such as images files, Document files, Video files, Audio files, Executable files, etc.

All binary files follow a specific format. You can open some binary files in the normal text editor but you cannot read the content present inside the file, because all the binary files are encoded in the binary format, which can be understood by us. Binary for¬mat is understood by a computer or machine only. For opening such binary files you need a specific type of software. For example, you need Microsoft word software to open .doc binary files, pdf reader software to open .pdf binary files, and a photo editor software to read the image files and so on.

File opening in Various Modes

The access mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. You specify the access mode of a file through the mode argument. You use ‘r’, to read the file, and use ‘w’ or ‘a’, to write or append, respectively. The mode argument if not passed, then Python will assume it to be V by default.

Mode is an optional string that specifies the mode in which the file is opened. The mode you choose will depend on what you wish to do with the file. Table 11.1 shows the details about the access mode to open a file.

Table

Opening and Reading a File

So far you have seen the various Python programs, which took the input from the keyboard and delivered the output on the screen. The programs which run for a short period of time, give some output, and after that data disappears. And when you run the programs again, you have to provide new inputs. This is because the data is entered in primary memory which is temporary in nature. Those programs which are persistent i.e., they are always in running then their data is to be stored in a permanent storage device (say, harddisk). If the program is closed or restarted then the same data can be retrieved. In this chapter, you will learn to take input from files and write results to the files. The data stored within a file is known as persistent data because this data is permanently stored in the system.

To open a file in Python, you first need some way to associate the file on disk with a variable in Python. This process is called opening a file. You begin by telling Python where the file is. The location of your file is often referred to as the file path. When you access a file on an operating system, a file path is required. If the path is in the current working directory, you can just provide the filename. If the file resides in a directory other than that, you have to provide the full path with the file name. It is broken up into three major parts:

Folder Path: It specifies the file folder location.
File Name: It is the actual name of the file.
Extension: It is an extension with a period (.) to indicate the file type.

Python has an in-built function called open() to open a file. You can specify the mode while opening a file. In mode, you specify whether you want to read V, write ‘w’ or append ‘a’ to the file. You also specify if you want to open the file in text mode or binary mode. The default is reading in text mode. In this mode, you get strings when reading from the file. The syntax is:
file object = open(filename, mode);

Here, file_name is the name of the file or the location of the file that you want to open, and file_name should include the file extension.
Let’s suppose you created a file named “abc.txt” present at the location “C:/Documents/python/” with the following contents inside it:
This is the first sentence of the code.
This is the second sentence of code.
This is the third sentence of the code.
Following the Python program shown in Figure 11.1 asks the user to enter the filename, read and display the content, and finally, it closes the file.

Example
Program to read and display content.

# Program to read and display content
print(“Enter ‘x’ for exit.”);
filename = input(“Enter filename with extension to read: “);
if filename == ‘x’:
exit( );
else:
r = open(filename, “r”);
print(“\nThe file,”, filename,”opened successfully!”);
print(“The file”,filename,”contains:\n”);
print (r.read ( ) )
r.close ( )RUN
>>>
Enter ‘x’ for exit.
Enter filename with extension to read: c:\documents\python\abc.txt
The file, c:\documents\python\abc.txt opened successfully!
The file c:\documents\python\abc.txt contains:
This is first sentence of code.
This is second sentence of code.
This is third sentence of code.
>>>

Closing the file

Once all the operations are done on the file, you must close it using the close() method. Any unwritten information gets destroyed once the close( ) method is called on a file object. The syntax of close() is:
file object.close ( );
Python automatically closes a file when the reference object of a file is reassigned to another file.

It is always the best practice to close a file when your work gets finished. While closing a file, the system frees up all resources allocated to it.

Writing onto a File

The write( ) method writes any string to an open file. The write( ) method does not add a newline character (‘\n’) to the end of the string. If you want the newline character you have to add it manually. Its syntax is:
file object.write(string);
To write some content to a file in Python, you have to ask users to enter the name of the file along with their extension like filename.txt. Then open that file if it exists, or create a file with that name and again ask the user to enter some content into the line of sentences to put those sentences into that file as the content of the file.

Following the Python program asks the user to enter the file name to open/create a file and asks them to enter three sentences as the content to be added in the file. Here is the Python program, used to create a new file and writes some content entered by the user. (See Figure 11.2).

Example
Program to write to file.

# program to write to file.
print ( Enter ‘x’ for exit.”);
filename = input(“Enter filename to create & write content: “);
if filename == ‘x’:
exit ( ) ;
else:
w = open(filename, “w”);
print(“The file,”, filename,”created successfully)”);
print(“Enter 3 sentences to write on the file: “);
s1 = input( );
s2 = input ( ) ;
s3 = input ();
w.write(s1);
w.write(“\n”);
w.write(s2) ;
w.write(“\n”);
w.write(s3);
w, close( );
print(“Content written successfully “);RUN
>>>
Enter ‘x’ for exit.
Enter filename to create & write content: c:\documents\python\xyz.txt
The file, c:\documents\python\xyz.txt created successfully!
Enter 3 sentences to write on the file:
I am writing my first sentence.
This is my second sentence.
And, finally my last sentence.
Content written successfully
>>>

Command Line Arguments

Python Command line arguments are input parameters passed to the script when executing them. In other words, it is an argument sent to a program being called. A program can take any number of command-line arguments. Python supports the creation of programs that can be run on the command line, complete with command-line arguments. Almost every modern programming language has the ability to take in arguments from the command line. It is a very important feature because it allows for dynamic inputs from users, whether they wrote the program or not.

Python comes with several different libraries that allow your Python code to take user input from the command line. The three most common ones are sys.argv, getopt, and argparse.

sys.argv

Python sys module stores the command line arguments in the form of a list, you can access it using sys.argv. The first element in the list is the name of the file. The arguments always come in the form of a string. This is a very useful and simple way to read command-line arguments as String. The program will accept an arbitrary number of arguments passed from the command-line (or terminal) while getting executed. The program will print all the arguments that were passed and the total number of arguments.

Getopt Module

Python getopt module is useful in parsing command line arguments where you want the user to enter some options too.

Argparse Module

Python argparse module is the preferred way to parse command-line arguments. It provides a lot of options such as positional arguments, the default value for arguments, helps message, specifying the data type of argument, etc. python argparse is the recommended command-line argument parsing module in Python. It is very common to the getopt module but that is a little complicated and usually need more code for the same task.

Python Programming – File Processing

Python Programming – File Processing

Leave a Reply Cancel reply