Papa Parse: Lightning Fast CSV Parsing Experience

Published in

codeburst

7 min readAug 15, 2020

Photo by Marc Sendra Martorell on Unsplash

Overview

With a plethora of options to parse CSV files/data and adding to it the inconsistency of the data present in the files, have you ever wondered about a simple and efficient package to do it for you? Presenting Papa Parse, a robust JavaScript library that is claimed to be the fastest in-browser CSV parser! This is your one-stop-shop for parsing CSV to JSON!

Highlights

Before getting into the features of Papa Parse, let’s look at how we can include this package in our code:

/* babel or ES6 */
import papaparse from ‘papaparse’;/* node or require js */
const papaparse = require(‘papaparse’);

The general syntax of use

For a CSV string:

var parsedOutput = Papa.parse(stringOfCsv[, config])

There are numerous configurations to choose from, best explained in the Papa Parse documentation here.

For a file:

Papa.parse(myFileInput.files[0], {
	complete: function(parsedOutput) {
		console.log(parsedOutput);
	}
});

As the portion of file parsing is an asynchronous activity, a call back must be added to collect the results.

The same is the case when we want to fetch the CSV file from a URL:

Papa.parse(csvUrl, {
	download: true,
	complete: function(parsedOutput) {
		console.log(parsedOutput);
	}
});

The parsed output/result consists of three parts — data array, errors array and the meta object. The data array has the result of the CSV rows parsed.

The data is an array only when the value is header: false in configs. If the config is header: true, then the output data is a set of objects keyed by the column field names.

The errors array contains the information on any errors which are encountered while parsing the CSV. The meta object is an object consisting of metadata related to the parsing such as delimiters, line break sequences, and field names to name a few.

Auto delimiter detection

There are many scenarios in which you wouldn’t be sure of the delimiter used in the CSV. Not to worry! Papa Parse has an auto delimiter detection feature in which the first few rows of the CSV are scanned to automatically figure out the delimiter used in the CSV file.

The delimiter which was considered for parsing can always be checked in the result output’s meta object under the delimiter field.

var output = Papa.parse(stringOfCsv); // input: a,b,c,d,e
console.log(output.meta.delimiter); // delimiter: ,

If you don’t want to have auto-detection of delimiters but want to provide a range of delimiters to guess from while parsing the CSV, there’s a config option called delimitersToGuess which takes in a list of delimiters provided as input. The default value for delimitersToGuess is -

delimitersToGuess : [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP]

Where Papa.RECORD_SEP and Papa.UNIT_SEP are read-only properties used to represent the ASCII Code 30 and ASCII Code 31 respectively as delimiters.

Ability to parse huge file inputs

If the input file is really huge, then Papa Parse has the ability to stream the input data and provide the output row-by-row. Doing this will avoid loading the whole file into memory which might otherwise crash the browser. The step function should be provided as a config which collects the result for each row.

Papa.parse("http://csvexample.com/enormous.csv", {
	download: true,
	step: function(row, parser) {
		console.log("Row:", row.data);
	},
	complete: function() {
		console.log("All done!");
	}
});

The second input to the step function is parser. The parser object can be used to abort, pause, or resume the CSV parsing.

parser.abort();
parser.pause();
parser.resume();

Do not use parser.pause() and parser.resume() while using Web Workers in your CSV parsing as the threads can get held up waiting for the continue signal from the main thread making the whole UX sluggish. More on that here.

Multithreading option in Papa Parse

If you are worried that your webpage will become unresponsive because of a CSV parsing script running for a long time on the main/UI thread, Papa Parse provides a configuration called worker which when set to true will ensure that a worker thread is used for the parsing of the CSV. Adding a worker thread might result in the parsing operation to slow down a little bit but will ensure that your website will remain responsive.

Papa.parse("http://csvexample.com/enormous.csv",  {
	worker: true,
	step: function(row) {
		console.log("Row:", row.data);
	},
	complete: function() {
		console.log("All done!");
	}
});

The worker thread is an extension of the default Worker interface provided by javascript.

Comments in your CSV?

However bizarre it sounds, if there are comments in your CSV which you would not want to parse, you can add the config provided by Papa Parse called comments and set it to a value that represents the comments’ format.

Papa.parse("http://csvexample.com/csv.csv”,  {
        comments: “#”, // All lines starting with ‘#’ are treated as comments and ignored by the parser.
	complete: function(parsedOutput) {
		console.log(parsedOutput);
	}
});

Type Conversion in Papa Parse

By default, all lines and fields are parsed as strings. But if you want to preserve the numeric and boolean types, Papa Parse provides an option called dynamicTyping to automatically enable the type conversion for your data.

Papa.parse("http://csvexample.com/csv.csv”,  {
        dynamicTyping: true,
	complete: function(parsedOutput) {
		console.log(parsedOutput);
	}
});

If true, numeric and boolean data in the string will be converted to their respective types. Numeric data must conform to the definition of a decimal literal. Numerical values greater than 2⁵³ or less than -2⁵³ will not be converted to numbers to preserve precision. European-formatted numbers must have commas and dots swapped. It also accepts an object or a function. In the case of an object, its values should be a boolean to indicate if dynamic typing should be applied for each column number (or header name if using headers). If it’s a function, it should return a boolean value for each field number (or name if using headers) which will be passed as the first argument.

Converting JSON to CSV format

Another wonderful feature of Papa Parse is its ability to convert JSON to CSV. All this while, you would have come across the parse() function. But for this feature, Papa Parse provides the unparse() option.

The output of the unparse() is a neatly formatted string of CSV. The general syntax is -

Papa.unparse(data[, config])

The data field can be an array of objects, an array of arrays or an object with header fields and data. The optional config for unparse(), much like the one for the parse() function has a wide range of options to choose from. You can check them out here.

Error Handling

The last feature we will be discussing in this article is about the error handling by Papa Parse.

As mentioned at the top of the article, the parsed results consist of three components: data, errors and meta.

The errors array is structured in the following way:

{
	type: "",     // A generalization of the error
	code: "",     // Standardized error code
	message: "",  // Human-readable details
	row: 0,       // Row index of parsed data where error is
}

One way of extracting the errors:

var results = Papa.parse(csvString);
console.log(results.errors.<key_type>);

Even if you do encounter errors while parsing, that’s no indication that the parsing of the CSV file failed.

A few useful configs for parsing

Some notable configs of Papa Parse for parsing which we will just mention here are:

newline - The newline sequence
quoteChar - The character used to quote fields
escapeChar - The character used to escape the quote character within a field
preview - If > 0, only that many rows will be parsed
transformHeader - A function to apply on each header. Requires header:true
chunk - A callback function, identical to step, which activates streaming

And many more :)

Bonus Utility Functions

Below are some React and Angular implementations for using Papa Parse to parse CSV data:

React Hook

function useGoogleSheetData(url) {
  const [rows, setRows] = useState([]);
  useEffect(() => {
    Papa.parse(url, {
          download: true,
          header: true,
          complete: function(results) {
            setRows(results.data);
          }
  }, [url]);
  return rows;
}and we would use it as:const rows = useGoogleSheetData("<my_csv_url>");

Angular Observable

useGoogleSheetData = (url: string): Observable<any> => {
    return new Observable((observer) => {
      parse(url, {
        download: true,
        header: true,
        complete: (result) => {
          observer.next(result);
          observer.complete();
        },
        error: (error) => {
           observer.error(error);
          observer.complete();
        }
      })
    });
};Can be used as below:this.useGoogleSheetData("<my_csv_url>").pipe(catchError((error) => {
    console.error(error);
    })).subscribe((data) => {
      this.sheetData = data;
    });
}

Evaluation Metrics

Conclusion

Looking at the features described above for Papa Parse, and many more it has to offer (you can check them out here), it is beyond any doubt that this package is the real deal. The ability of Papa Parse to handle huge files and unstructured data and its support for taking in readable streams as input(used in node.js) is what makes it stand out from the rest of the CSV parsing packages.

Hope you’ve got a good insight into what Papa Parse is all about and how you can use it for your future projects :-)

Check out the package and some reading materials

Video review of the package

Video review of the package with interesting use cases and in-depth exploration of the features coming soon! For more related content, check out Unpackaged Reviews.

Disclosures

The content and evaluation scores mentioned in this article/review is subjective and is the personal opinion of authors at Unpackaged Reviews based on everyday usage and research on popular developer forums. They do not represent any company’s views and are not impacted by any sponsorships/collaboration.