Numpy Array to Pandas DataFrame
If you want to convert Numpy Array to Pandas DataFrame, you have three options. The first two boil down to passing in a 1D or 2D Numpy array to a call to pd.DataFrame
, and the last one leverages the built-in from_records()
method. You’ll learn all three approaches today, with a ton of hands-on examples.
To be perfectly honest, there are many more ways to convert a Numpy array to DataFrame, but in reality, you only need these three. Everything else is just a modification and brings no novelty to the table.
Before proceeding, it would be helpful if you already know how to Convert Python List to Pandas DataFrame, and also how to Convert Python Dictionary to Pandas DataFrame. Reading these articles isn’t mandatory, but it can’t hurt to know.
Regarding library imports, you’ll need both Numpy and Pandas today, so stick these two lines at the top of your Python script or notebook:
import numpy as np
import pandas as pd
Table of contents:
Convert Numpy Array to Pandas DataFrame - 1D Numpy Arrays
Think of 1D arrays as vectors or distinct features in the dataset. For example, a 1D array can represent age, first name, date of birth, or job title - but it can’t represent all of them. You’d need four 1D arrays to do so.
Let’s see this in action. The following code snippet converts a 1D Numpy array to Pandas DataFrame:
arr = np.array([1, 2, 3])
data = pd.DataFrame(arr)
data
It’s just a vector of numbers, so the resulting DataFrame won’t be too interesting:
In case you want to convert a Numpy array to Pandas DataFrame with a column name, you’ll have to provide a value to the columns
argument. It has to be a list, so keep that in mind:
arr = np.array([1, 2, 3])
data = pd.DataFrame(arr, columns=["Number"])
data
The resulting DataFrame has a bit more context now:
Now, DataFrames with only a single feature aren’t the most interesting, so let’s see how we can spice things up with multidimensional Numpy arrays.
Numpy Array to DataFrame - 2D Numpy Arrays
Think of 2D arrays as matrices. We have rows and columns, where each row represents the values for one observation, measured across multiple features (columns). Each column contains information on the same feature across multiple observations.
Let’s go through a dummy example first, just so you can grasp how to leverage Pandas to create DataFrame from an array:
arr = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
data = pd.DataFrame(arr, columns=["Num 1", "Num 2", "Num 3"])
data
The DataFrame has three observations (rows) measured through three features (columns):
Maybe the dummy example doesn’t paint you the full picture, so take a look at the following example if that’s the case. In it, we’re declaring a 2D Numpy array of employees.
Each row is a single observation telling us the detail of employee’s first name, last name, and email address. Each column is essentially a 1D array (vector) representing either first names, last names, or emails, across all records:
employees = np.array([
["Bob", "Doe", "bdoe@company.com"],
["Mark", "Markson", "mmarkson@company.com"],
["Jane", "Swift", "jswift@company.com"],
["Patrick", "Johnson", "pjohnson@company.com"]
])
data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
data
Here’s the resulting DataFrame:
And that’s how you can convert both 1D and 2D Numpy arrays to Pandas DataFrames. Let’s take a look at another way of doing the same thing, which is with the built-in from_records()
method.
How to Convert Numpy Array to Pandas DataFrame with the from_records() Method
Pandas has a built-in method that allows you to convert a multidimensional Numpy array to Pandas DataFrame. It’s called from_records()
, and it is specific to the DataFrame
class.
Truth be told, you don’t have to use it, since it provides no advantage over the conversion approaches we’ve covered so far. But still, if you want a dedicated method, here’s how to use it:
employees = np.array([
["Bob", "Doe", "bdoe@company.com"],
["Mark", "Markson", "mmarkson@company.com"],
["Jane", "Swift", "jswift@company.com"],
["Patrick", "Johnson", "pjohnson@company.com"]
])
data = pd.DataFrame.from_records(employees, columns=["First name", "Last name", "Email"])
data
The resulting Pandas DataFrame is identical to the one from the previous section:
And that’s how you can convert a Numpy array to Pandas DataFrame. Let’s go over some commonly asked questions next.
Numpy Array to Pandas DataFrame Q&A
This section will walk you through some common questions regarding the Numpy array to Pandas DataFrame conversion.
Q: Can Pandas Work with Numpy Arrays?
A: Yes, Pandas can work with Numpy arrays, just as well as with plain Python lists. You can declare either a bunch of 1D arrays or a single 2D Numpy array and convert it to a Pandas DataFrame by passing it into the pd.DataFrame()
call. Just remember to specify the column names, otherwise, the default range index will be used.
Q: How Can You Convert a Numpy Array Into a Pandas DataFrame?
A: You can use either a call to pd.DataFrame()
or the pd.DataFrame.from_records()
method. Both of these work identically, and you can leverage them to convert a 2D Numpy array (matrix) to a Pandas DataFrame.
Q: How to Convert Numpy Array to DataFrame Column
A: You can use Numpy to add additional columns to an existing Pandas DataFrame. For example, the following code snippet declares a Pandas DataFrame from a 2D Numpy array:
employees = np.array([
["Bob", "Doe", "bdoe@company.com"],
["Mark", "Markson", "mmarkson@company.com"],
["Jane", "Swift", "jswift@company.com"],
["Patrick", "Johnson", "pjohnson@company.com"]
])
data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
data
To convert a Numpy array to a DataFrame column, you only have to declare a new Numpy array and assign it to a new column. Here’s the code:
years_of_experience = np.array([5, 3, 8, 12])
data["Years of Experience"] = years_of_experience
data
The DataFrame now has four columns instead of three:
And that’s all for today. Let’s make a short recap next.
Summing up Numpy Array to Pandas DataFrame
To conclude, Python’s Pandas library provides a user-friendly API for converting most common data types into Pandas DataFrames - Numpy array being one of them. This article covered three ways to convert a Numpy array to Pandas DataFrame, and these are all you need when working with Numpy.
There are some variations to these approaches, but they have nothing to do with Pandas. Learn these three, and you’ll be ready for any data analysis project coming your way.
Stay tuned to Practical Pandas website because next, we’ll explore how to add rows and columns to a Pandas DataFrame.