Read Very Large File (7+ GB file) in Nodejs

How to read the extra large file apprx 7+ GB text file, using Nodejs

  idkblogs.com      May 1, 2021

Shubham Verma

Shubham Verma


Read Very Large File (7+ GB file) in Nodejs

S ometimes, we need to read the very large file in our Node app, at that time it's a very big deal about this large file. We need to write the very optimized code to read this file. In this article, we will see how we can read very large files using NodeJs.

Let download the large file first:

You can download the large file (name planet-latest_geonames.gz) from this download file link. This file is approx 1.6 GB large. After the download, extract the file.

Understand the file:

The downloaded file planet-latest_geonames.gz contains the geo details. Let's see the file details first. Let's open the file and see:

Read Very Large File (7+ GB file) in Nodejs

Read Very Large File (7+ GB file) in Nodejs

In the above gif, you can see the file size is more than 7 GB. Let's have a look at what is inside the file planet-latest_geonames?
In the above file, you can see, there are so many data contains following things:


As you can see in the below snapshot:

Read Very Large File (7+ GB file) in Nodejs

Read Very Large File (7+ GB file) in Nodejs


If you observe the file (planet-latest_geonames.tsv) carefully, you can see, the data are separated with '/t', so we can distribute the data by '/t' programmatically. It means, we can get many sets of data by distributing with '\t'.
This file is a CSV file, and the data are comma-separated, we can parse this data with any CSV parser module. We will use an npm module called "csv-parser". Now it's time to write the code and read and get the file content.

Write the code to read the large file:

Now, we need to write the optimized code to read the very large file. we'll use the fs module to read the file and we will use csv-parser to parse the file. Because our file containes data in CSV format. We will read this large file using stream. We will create file stream and through this file stream, we will read this large file.

Let's create a file with name read_large_file.js and start write the code and then we will understand the codes.

read_large_file.js:


Install the dependencies:

After creating the file read_large_file.js, we need to install the dependencies by using below command:

Understand the codes:

Now its time to understand the codes:
In the above codes, we have imported the required dependencies to read the very large file in nodejs.


In the above code, we have created a file stream to read the file. It is a must if you are reading a very large file. This is the optimized way to read the very large file in nodejs.
We are using the CSV module to break the content of the file. It required here to break the line and get the data in chunk. Also, we are storing this stream in a variable called stream for further operation like stream.pause() and stream.resume().


In the above codes, we are creating an object with the property "name", and storing the name data from the file in this object. This code will return the array of objects.


The "batch" method will be used to combine the result in a batch, here we are combining the 100 records at a time to process further. You can give it to any number.


In the above codes, we are iterating over data, In these codes, we are getting our data and logging this data on the console. This code will print the data. Here we can write our logic to operate the data as per the requirements.


The above codes will be executed at the end of the file read.

Run the code:

Now, we understand the codes, Now it's time to run the code and see the result. To run the above code, run the below command:

And see the result:

Read Very Large File (7+ GB file) in Nodejs

Read Very Large File (7+ GB file) in Nodejs


Now you can see in the above gif, how our code is reading the file and getting the data:

Read Very Large File (7+ GB file) in Nodejs

Read Very Large File (7+ GB file) in Nodejs


Get the more data from this file:

The file planet-latest_geonames.tsv contains a lot of data, we are fetching only names from this file. We can fetch the following JSON using following codes:

instead


After the above code update, you will have the following kind of array of JSON data:

So you have seen, how we can read the very large file in nodejs.

Conclusion:

In this article, we learned about the very large file and How we can read and get the data from a very large file.


Thank you

for taking the time to read this article. If you’re interested in Node.js or JavaScript this link will help you a lot.

If you found this article is helpful, then please share this article's link to your friends to whom this is required, you can share this to your technical social media groups also. You can follow us on our social media page for more updates and latest article updates.
To read more about the technologies, Please subscribe us, You'll get the monthly newsletter having all the published article of the last month.