Scout At present This tutorial has a related video class created by the Real Python team. Sentinel it together with the written tutorial to deepen your understanding: Reading and Writing Files With Pandas

Pandas is a powerful and flexible Python package that allows yous to work with labeled and fourth dimension serial information. It also provides statistics methods, enables plotting, and more. I crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files. Functions like the Pandas read_csv() method enable yous to work with files effectively. You can use them to save the data and labels from Pandas objects to a file and load them after every bit Pandas Series or DataFrame instances.

In this tutorial, yous'll acquire:

  • What the Pandas IO tools API is
  • How to read and write data to and from files
  • How to piece of work with diverse file formats
  • How to work with big data efficiently

Permit'due south start reading and writing files!

Installing Pandas

The code in this tutorial is executed with CPython 3.7.4 and Pandas 0.25.1. It would be beneficial to make sure y'all accept the latest versions of Python and Pandas on your machine. You might desire to create a new virtual environs and install the dependencies for this tutorial.

First, y'all'll need the Pandas library. You may already accept it installed. If you don't, and so you tin install information technology with pip:

In one case the installation process completes, you should accept Pandas installed and fix.

Anaconda is an excellent Python distribution that comes with Python, many useful packages similar Pandas, and a package and environment manager chosen Conda. To larn more virtually Anaconda, check out Setting Upwards Python for Machine Learning on Windows.

If yous don't have Pandas in your virtual surround, then you tin install information technology with Conda:

Conda is powerful equally information technology manages the dependencies and their versions. To learn more about working with Conda, you tin cheque out the official documentation.

Preparing Information

In this tutorial, you'll employ the data related to xx countries. Here'southward an overview of the data and sources yous'll be working with:

  • Country is denoted by the country name. Each country is in the top 10 list for either population, expanse, or gross domestic product (Gdp). The row labels for the dataset are the three-letter country codes defined in ISO 3166-1. The column label for the dataset is COUNTRY.

  • Population is expressed in millions. The data comes from a list of countries and dependencies past population on Wikipedia. The cavalcade label for the dataset is POP.

  • Surface area is expressed in thousands of kilometers squared. The data comes from a list of countries and dependencies past area on Wikipedia. The cavalcade label for the dataset is Area.

  • Gdp is expressed in millions of U.Due south. dollars, according to the United nations data for 2017. You can find this data in the listing of countries by nominal Gdp on Wikipedia. The column label for the dataset is Gdp.

  • Continent is either Africa, Asia, Oceania, Europe, Due north America, or Due south America. Yous can find this information on Wikipedia every bit well. The column label for the dataset is CONT.

  • Independence solar day is a date that commemorates a nation'southward independence. The data comes from the listing of national independence days on Wikipedia. The dates are shown in ISO 8601 format. The first four digits represent the year, the side by side two numbers are the month, and the last two are for the day of the calendar month. The cavalcade label for the dataset is IND_DAY.

This is how the data looks as a table:

COUNTRY POP AREA GDP CONT IND_DAY
CHN China 1398.72 9596.96 12234.78 Asia
IND Republic of india 1351.16 3287.26 2575.67 Asia 1947-08-15
The states United states of america 329.74 9833.52 19485.39 Northward.America 1776-07-04
IDN Republic of indonesia 268.07 1910.93 1015.54 Asia 1945-08-17
BRA Brazil 210.32 8515.77 2055.51 South.America 1822-09-07
PAK Pakistan 205.71 881.91 302.14 Asia 1947-08-14
NGA Nigeria 200.96 923.77 375.77 Africa 1960-10-01
BGD Bangladesh 167.09 147.57 245.63 Asia 1971-03-26
RUS Russia 146.79 17098.25 1530.75 1992-06-12
MEX Mexico 126.58 1964.38 1158.23 Northward.America 1810-09-16
JPN Nihon 126.22 377.97 4872.42 Asia
DEU Germany 83.02 357.11 3693.20 Europe
FRA France 67.02 640.68 2582.49 Europe 1789-07-14
GBR UK 66.44 242.50 2631.23 Europe
ITA Italy 60.36 301.34 1943.84 Europe
ARG Argentina 44.94 2780.40 637.49 S.America 1816-07-09
DZA People's democratic republic of algeria 43.38 2381.74 167.56 Africa 1962-07-05
CAN Canada 37.59 9984.67 1647.12 N.America 1867-07-01
AUS Australia 25.47 7692.02 1408.68 Oceania
KAZ Kazakhstan 18.53 2724.xc 159.41 Asia 1991-12-16

You may detect that some of the data is missing. For instance, the continent for Russia is not specified considering it spreads across both Europe and Asia. There are too several missing independence days considering the data source omits them.

You can organize this data in Python using a nested dictionary:

                                            data                =                {                'CHN'                :                {                'COUNTRY'                :                'Prc'                ,                'POP'                :                1_398.72                ,                'Surface area'                :                9_596.96                ,                'GDP'                :                12_234.78                ,                'CONT'                :                'Asia'                },                'IND'                :                {                'State'                :                'Republic of india'                ,                'Popular'                :                1_351.16                ,                'AREA'                :                3_287.26                ,                'Gdp'                :                2_575.67                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-xv'                },                'USA'                :                {                'COUNTRY'                :                'Usa'                ,                'POP'                :                329.74                ,                'Expanse'                :                9_833.52                ,                'GDP'                :                19_485.39                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1776-07-04'                },                'IDN'                :                {                'State'                :                'Republic of indonesia'                ,                'POP'                :                268.07                ,                'Expanse'                :                1_910.93                ,                'GDP'                :                1_015.54                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1945-08-17'                },                'BRA'                :                {                'COUNTRY'                :                'Brazil'                ,                'POP'                :                210.32                ,                'AREA'                :                8_515.77                ,                'Gdp'                :                2_055.51                ,                'CONT'                :                'Southward.America'                ,                'IND_DAY'                :                '1822-09-07'                },                'PAK'                :                {                'State'                :                'Pakistan'                ,                'POP'                :                205.71                ,                'AREA'                :                881.91                ,                'GDP'                :                302.fourteen                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-fourteen'                },                'NGA'                :                {                'State'                :                'Nigeria'                ,                'Pop'                :                200.96                ,                'Area'                :                923.77                ,                'Gross domestic product'                :                375.77                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1960-10-01'                },                'BGD'                :                {                'Country'                :                'Bangladesh'                ,                'POP'                :                167.09                ,                'Surface area'                :                147.57                ,                'GDP'                :                245.63                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1971-03-26'                },                'RUS'                :                {                'COUNTRY'                :                'Russia'                ,                'Popular'                :                146.79                ,                'AREA'                :                17_098.25                ,                'Gross domestic product'                :                1_530.75                ,                'IND_DAY'                :                '1992-06-12'                },                'MEX'                :                {                'COUNTRY'                :                'Mexico'                ,                'Popular'                :                126.58                ,                'AREA'                :                1_964.38                ,                'Gdp'                :                1_158.23                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1810-09-16'                },                'JPN'                :                {                'COUNTRY'                :                'Nippon'                ,                'Popular'                :                126.22                ,                'Expanse'                :                377.97                ,                'GDP'                :                4_872.42                ,                'CONT'                :                'Asia'                },                'DEU'                :                {                'COUNTRY'                :                'Frg'                ,                'Popular'                :                83.02                ,                'Area'                :                357.11                ,                'GDP'                :                3_693.xx                ,                'CONT'                :                'Europe'                },                'FRA'                :                {                'Land'                :                'French republic'                ,                'POP'                :                67.02                ,                'AREA'                :                640.68                ,                'GDP'                :                2_582.49                ,                'CONT'                :                'Europe'                ,                'IND_DAY'                :                '1789-07-fourteen'                },                'GBR'                :                {                'COUNTRY'                :                'UK'                ,                'POP'                :                66.44                ,                'Surface area'                :                242.l                ,                'Gdp'                :                2_631.23                ,                'CONT'                :                'Europe'                },                'ITA'                :                {                'Country'                :                'Italy'                ,                'Popular'                :                lx.36                ,                'Surface area'                :                301.34                ,                'Gross domestic product'                :                1_943.84                ,                'CONT'                :                'Europe'                },                'ARG'                :                {                'COUNTRY'                :                'Argentine republic'                ,                'Pop'                :                44.94                ,                'AREA'                :                2_780.40                ,                'GDP'                :                637.49                ,                'CONT'                :                'S.America'                ,                'IND_DAY'                :                '1816-07-09'                },                'DZA'                :                {                'COUNTRY'                :                'People's democratic republic of algeria'                ,                'POP'                :                43.38                ,                'Area'                :                2_381.74                ,                'GDP'                :                167.56                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1962-07-05'                },                'Can'                :                {                'COUNTRY'                :                'Canada'                ,                'Popular'                :                37.59                ,                'Area'                :                9_984.67                ,                'GDP'                :                1_647.12                ,                'CONT'                :                'Northward.America'                ,                'IND_DAY'                :                '1867-07-01'                },                'AUS'                :                {                'Country'                :                'Australia'                ,                'POP'                :                25.47                ,                'Surface area'                :                7_692.02                ,                'GDP'                :                1_408.68                ,                'CONT'                :                'Oceania'                },                'KAZ'                :                {                'Country'                :                'Kazakhstan'                ,                'POP'                :                18.53                ,                'Area'                :                2_724.ninety                ,                'Gross domestic product'                :                159.41                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1991-12-16'                }                }                columns                =                (                'COUNTRY'                ,                'POP'                ,                'Surface area'                ,                'GDP'                ,                'CONT'                ,                'IND_DAY'                )                          

Each row of the tabular array is written as an inner dictionary whose keys are the column names and values are the respective information. These dictionaries are then collected as the values in the outer information dictionary. The corresponding keys for data are the three-alphabetic character country codes.

You tin utilize this data to create an example of a Pandas DataFrame. Start, you need to import Pandas:

>>>

                                            >>>                                import                pandas                as                pd                          

Now that you have Pandas imported, you can utilize the DataFrame constructor and data to create a DataFrame object.

data is organized in such a way that the country codes correspond to columns. You tin can reverse the rows and columns of a DataFrame with the property .T:

>>>

                                            >>>                                df                =                pd                .                DataFrame                (                data                =                information                )                .                T                >>>                                df                                  COUNTRY      POP     Surface area      GDP       CONT     IND_DAY                CHN       China  1398.72  9596.96  12234.8       Asia         NaN                IND       Republic of india  1351.16  3287.26  2575.67       Asia  1947-08-15                USA          United states of america   329.74  9833.52  19485.four  N.America  1776-07-04                IDN   Republic of indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                BRA      Brazil   210.32  8515.77  2055.51  S.America  1822-09-07                PAK    Pakistan   205.71   881.91   302.14       Asia  1947-08-14                NGA     Nigeria   200.96   923.77   375.77     Africa  1960-10-01                BGD  Bangladesh   167.09   147.57   245.63       Asia  1971-03-26                RUS      Russia   146.79  17098.2  1530.75        NaN  1992-06-12                MEX      Mexico   126.58  1964.38  1158.23  N.America  1810-09-sixteen                JPN       Japan   126.22   377.97  4872.42       Asia         NaN                DEU     Deutschland    83.02   357.11   3693.2     Europe         NaN                FRA      French republic    67.02   640.68  2582.49     Europe  1789-07-14                GBR          UK    66.44    242.five  2631.23     Europe         NaN                ITA       Italy    60.36   301.34  1943.84     Europe         NaN                ARG   Argentina    44.94   2780.4   637.49  Due south.America  1816-07-09                DZA     Algeria    43.38  2381.74   167.56     Africa  1962-07-05                Tin      Canada    37.59  9984.67  1647.12  N.America  1867-07-01                AUS   Australia    25.47  7692.02  1408.68    Oceania         NaN                KAZ  Kazakhstan    18.53   2724.ix   159.41       Asia  1991-12-16                          

Now you have your DataFrame object populated with the data nearly each land.

Versions of Python older than 3.vi did not guarantee the order of keys in dictionaries. To ensure the club of columns is maintained for older versions of Python and Pandas, you can specify index=columns:

>>>

                                            >>>                                df                =                pd                .                DataFrame                (                data                =                data                ,                index                =                columns                )                .                T                          

Now that you've prepared your information, you're ready to get-go working with files!

Using the Pandas read_csv() and .to_csv() Functions

A comma-separated values (CSV) file is a plaintext file with a .csv extension that holds tabular data. This is ane of the most popular file formats for storing big amounts of data. Each row of the CSV file represents a unmarried table row. The values in the same row are past default separated with commas, only yous could alter the separator to a semicolon, tab, space, or some other character.

Write a CSV File

You can save your Pandas DataFrame as a CSV file with .to_csv():

>>>

                                                  >>>                                    df                  .                  to_csv                  (                  'data.csv'                  )                              

That's it! You've created the file data.csv in your current working directory. You can expand the lawmaking block below to see how your CSV file should look:

                                ,Land,POP,Expanse,Gdp,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia, IND,India,1351.16,3287.26,2575.67,Asia,1947-08-15 United states of america,U.s.a.,329.74,9833.52,19485.39,N.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,South.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.xiv,Asia,1947-08-fourteen NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-xvi JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Federal republic of germany,83.02,357.xi,3693.two,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,UK,66.44,242.5,2631.23,Europe, ITA,Italy,threescore.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09 DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05 Tin,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Republic of kazakhstan,18.53,2724.9,159.41,Asia,1991-12-sixteen                              

This text file contains the data separated with commas. The first cavalcade contains the row labels. In some cases, yous'll find them irrelevant. If you don't want to keep them, then you can laissez passer the argument index=Fake to .to_csv().

Read a CSV File

Once your data is saved in a CSV file, you'll likely desire to load and apply it from time to fourth dimension. Yous can do that with the Pandas read_csv() office:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      COUNTRY      Pop      Area       Gross domestic product       CONT     IND_DAY                  CHN       Red china  1398.72   9596.96  12234.78       Asia         NaN                  IND       Republic of india  1351.16   3287.26   2575.67       Asia  1947-08-fifteen                  U.s.a.          United states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  South.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia  1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Federal republic of germany    83.02    357.xi   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          Britain    66.44    242.50   2631.23     Europe         NaN                  ITA       Italy    60.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.40    637.49  S.America  1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa  1962-07-05                  Can      Canada    37.59   9984.67   1647.12  Northward.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia  1991-12-16                              

In this instance, the Pandas read_csv() role returns a new DataFrame with the information and labels from the file data.csv, which you specified with the start argument. This string tin be whatever valid path, including URLs.

The parameter index_col specifies the cavalcade from the CSV file that contains the row labels. Yous assign a zero-based column index to this parameter. You should determine the value of index_col when the CSV file contains the row labels to avoid loading them as information.

You'll learn more than almost using Pandas with CSV files subsequently in this tutorial. You can also check out Reading and Writing CSV Files in Python to encounter how to handle CSV files with the born Python library csv as well.

Using Pandas to Write and Read Excel Files

Microsoft Excel is probably the most widely-used spreadsheet software. While older versions used binary .xls files, Excel 2007 introduced the new XML-based .xlsx file. Yous can read and write Excel files in Pandas, similar to CSV files. However, yous'll demand to install the following Python packages first:

  • xlwt to write to .xls files
  • openpyxl or XlsxWriter to write to .xlsx files
  • xlrd to read Excel files

You lot can install them using pip with a single control:

                                            $                pip install xlwt openpyxl xlsxwriter xlrd                          

You lot can also use Conda:

                                            $                conda install xlwt openpyxl xlsxwriter xlrd                          

Please annotation that you lot don't have to install all these packages. For example, you don't demand both openpyxl and XlsxWriter. If you're going to work merely with .xls files, and then you don't need any of them! However, if you intend to work only with .xlsx files, and then you're going to demand at least one of them, merely not xlwt. Take some time to decide which packages are correct for your project.

Write an Excel File

In one case y'all have those packages installed, y'all tin can save your DataFrame in an Excel file with .to_excel():

>>>

                                                  >>>                                    df                  .                  to_excel                  (                  'data.xlsx'                  )                              

The argument 'data.xlsx' represents the target file and, optionally, its path. The in a higher place statement should create the file data.xlsx in your current working directory. That file should expect like this:

mmst-pandas-rw-files-excel

The first column of the file contains the labels of the rows, while the other columns store information.

Read an Excel File

Yous tin load data from Excel files with read_excel():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'data.xlsx'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      COUNTRY      Pop      Expanse       GDP       CONT     IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.16   3287.26   2575.67       Asia  1947-08-15                  USA          US   329.74   9833.52  19485.39  Northward.America  1776-07-04                  IDN   Republic of indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.fourteen       Asia  1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      United mexican states   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.11   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-xiv                  GBR          UK    66.44    242.l   2631.23     Europe         NaN                  ITA       Italy    60.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.40    637.49  S.America  1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Due north.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    xviii.53   2724.ninety    159.41       Asia  1991-12-xvi                              

read_excel() returns a new DataFrame that contains the values from data.xlsx. You can also use read_excel() with OpenDocument spreadsheets, or .ods files.

You'll acquire more about working with Excel files subsequently on in this tutorial. You tin can besides bank check out Using Pandas to Read Large Excel Files in Python.

Understanding the Pandas IO API

Pandas IO Tools is the API that allows y'all to salve the contents of Series and DataFrame objects to the clipboard, objects, or files of various types. It also enables loading information from the clipboard, objects, or files.

Write Files

Series and DataFrame objects have methods that enable writing information and labels to the clipboard or files. They're named with the pattern .to_<file-type>() , where <file-type> is the type of the target file.

You've learned about .to_csv() and .to_excel(), but there are others, including:

  • .to_json()
  • .to_html()
  • .to_sql()
  • .to_pickle()

In that location are still more file types that you can write to, so this list is not exhaustive.

These methods take parameters specifying the target file path where y'all saved the information and labels. This is mandatory in some cases and optional in others. If this selection is available and you choose to omit it, and then the methods return the objects (like strings or iterables) with the contents of DataFrame instances.

The optional parameter pinch decides how to compress the file with the data and labels. You'll acquire more about it after. There are a few other parameters, but they're mostly specific to ane or several methods. You won't go into them in item here.

Read Files

Pandas functions for reading the contents of files are named using the pattern .read_<file-type>() , where <file-blazon> indicates the type of the file to read. You've already seen the Pandas read_csv() and read_excel() functions. Here are a few others:

  • read_json()
  • read_html()
  • read_sql()
  • read_pickle()

These functions have a parameter that specifies the target file path. It tin can exist any valid string that represents the path, either on a local machine or in a URL. Other objects are too acceptable depending on the file blazon.

The optional parameter compression determines the type of decompression to employ for the compressed files. You'll learn nearly it later on in this tutorial. In that location are other parameters, just they're specific to one or several functions. Yous won't go into them in detail here.

Working With Different File Types

The Pandas library offers a broad range of possibilities for saving your data to files and loading information from files. In this department, you lot'll learn more nearly working with CSV and Excel files. You'll also encounter how to utilise other types of files, similar JSON, spider web pages, databases, and Python pickle files.

CSV Files

Y'all've already learned how to read and write CSV files. Now let's dig a petty deeper into the details. When you use .to_csv() to save your DataFrame, you lot can provide an argument for the parameter path_or_buf to specify the path, name, and extension of the target file.

path_or_buf is the first argument .to_csv() will go. It can be whatsoever cord that represents a valid file path that includes the file name and its extension. You've seen this in a previous example. However, if you omit path_or_buf, then .to_csv() won't create any files. Instead, information technology'll return the corresponding cord:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    south                  =                  df                  .                  to_csv                  ()                  >>>                                    print                  (                  due south                  )                  ,State,Popular,AREA,GDP,CONT,IND_DAY                  CHN,People's republic of china,1398.72,9596.96,12234.78,Asia,                  IND,Republic of india,1351.xvi,3287.26,2575.67,Asia,1947-08-15                  U.s.a.,The states,329.74,9833.52,19485.39,Due north.America,1776-07-04                  IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17                  BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07                  PAK,Pakistan,205.71,881.91,302.14,Asia,1947-08-xiv                  NGA,Nigeria,200.96,923.77,375.77,Africa,1960-ten-01                  BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26                  RUS,Russia,146.79,17098.25,1530.75,,1992-06-12                  MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-sixteen                  JPN,Nihon,126.22,377.97,4872.42,Asia,                  DEU,Germany,83.02,357.11,3693.2,Europe,                  FRA,French republic,67.02,640.68,2582.49,Europe,1789-07-14                  GBR,UK,66.44,242.5,2631.23,Europe,                  ITA,Italy,60.36,301.34,1943.84,Europe,                  ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09                  DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05                  CAN,Canada,37.59,9984.67,1647.12,North.America,1867-07-01                  AUS,Australia,25.47,7692.02,1408.68,Oceania,                  KAZ,Kazakhstan,xviii.53,2724.9,159.41,Asia,1991-12-sixteen                              

At present you have the string s instead of a CSV file. Y'all besides have some missing values in your DataFrame object. For example, the continent for Russia and the independence days for several countries (China, Japan, and and then on) are not available. In data science and auto learning, yous must handle missing values advisedly. Pandas excels here! By default, Pandas uses the NaN value to replace the missing values.

The continent that corresponds to Russia in df is nan:

>>>

                                                  >>>                                    df                  .                  loc                  [                  'RUS'                  ,                  'CONT'                  ]                  nan                              

This example uses .loc[] to get data with the specified row and column names.

When you salvage your DataFrame to a CSV file, empty strings ('') volition stand for the missing data. You can run across this both in your file data.csv and in the string s. If yous desire to change this behavior, so apply the optional parameter na_rep:

>>>

                                                  >>>                                    df                  .                  to_csv                  (                  'new-data.csv'                  ,                  na_rep                  =                  '(missing)'                  )                              

This code produces the file new-data.csv where the missing values are no longer empty strings. You tin expand the code block beneath to meet how this file should look:

                                ,State,POP,Area,Gross domestic product,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia,(missing) IND,India,1351.sixteen,3287.26,2575.67,Asia,1947-08-15 USA,US,329.74,9833.52,19485.39,N.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.14,Asia,1947-08-xiv NGA,Nigeria,200.96,923.77,375.77,Africa,1960-ten-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,(missing),1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,Northward.America,1810-09-16 JPN,Japan,126.22,377.97,4872.42,Asia,(missing) DEU,Germany,83.02,357.xi,3693.2,Europe,(missing) FRA,France,67.02,640.68,2582.49,Europe,1789-07-fourteen GBR,UK,66.44,242.5,2631.23,Europe,(missing) ITA,Italian republic,60.36,301.34,1943.84,Europe,(missing) ARG,Argentina,44.94,2780.four,637.49,S.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,Northward.America,1867-07-01 AUS,Commonwealth of australia,25.47,7692.02,1408.68,Oceania,(missing) KAZ,Kazakhstan,eighteen.53,2724.nine,159.41,Asia,1991-12-sixteen                              

Now, the cord '(missing)' in the file corresponds to the nan values from df.

When Pandas reads files, it considers the empty string ('') and a few others as missing values by default:

  • 'nan'
  • '-nan'
  • 'NA'
  • 'North/A'
  • 'NaN'
  • 'null'

If you don't want this behavior, and then you can pass keep_default_na=False to the Pandas read_csv() function. To specify other labels for missing values, use the parameter na_values:

>>>

                                                  >>>                                    pd                  .                  read_csv                  (                  'new-data.csv'                  ,                  index_col                  =                  0                  ,                  na_values                  =                  '(missing)'                  )                                      COUNTRY      Pop      Area       GDP       CONT     IND_DAY                  CHN       People's republic of china  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.16   3287.26   2575.67       Asia  1947-08-15                  U.s.          United states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Republic of indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia  1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  Northward.America  1810-09-xvi                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.11   3693.xx     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          Britain    66.44    242.50   2631.23     Europe         NaN                  ITA       Italy    60.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.40    637.49  S.America  1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  North.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.xc    159.41       Asia  1991-12-16                              

Here, you've marked the string '(missing)' as a new missing data label, and Pandas replaced it with nan when information technology read the file.

When y'all load data from a file, Pandas assigns the information types to the values of each column by default. Y'all tin check these types with .dtypes:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'information.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                  .                  dtypes                  Land     object                  POP        float64                  Expanse       float64                  GDP        float64                  CONT        object                  IND_DAY     object                  dtype: object                              

The columns with strings and dates ('COUNTRY', 'CONT', and 'IND_DAY') have the data type object. Meanwhile, the numeric columns contain 64-bit floating-bespeak numbers (float64).

You can apply the parameter dtype to specify the desired data types and parse_dates to force use of datetimes:

>>>

                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float32'                  ,                  'AREA'                  :                  'float32'                  ,                  'GDP'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float32                  AREA              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  [                  'IND_DAY'                  ]                  CHN          NaT                  IND   1947-08-15                  United states   1776-07-04                  IDN   1945-08-17                  BRA   1822-09-07                  PAK   1947-08-14                  NGA   1960-10-01                  BGD   1971-03-26                  RUS   1992-06-12                  MEX   1810-09-16                  JPN          NaT                  DEU          NaT                  FRA   1789-07-14                  GBR          NaT                  ITA          NaT                  ARG   1816-07-09                  DZA   1962-07-05                  CAN   1867-07-01                  AUS          NaT                  KAZ   1991-12-16                  Name: IND_DAY, dtype: datetime64[ns]                              

Now, you accept 32-flake floating-point numbers (float32) equally specified with dtype. These differ slightly from the original 64-chip numbers because of smaller precision. The values in the last cavalcade are considered equally dates and have the data type datetime64. That's why the NaN values in this cavalcade are replaced with NaT.

Now that you have real dates, you can salve them in the format you like:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_csv                  (                  'formatted-information.csv'                  ,                  date_format                  =                  '%B                                    %d                  , %Y'                  )                              

Here, you lot've specified the parameter date_format to be '%B %d, %Y'. You can expand the code block below to see the resulting file:

                                ,State,POP,Expanse,Gross domestic product,CONT,IND_DAY CHN,Prc,1398.72,9596.96,12234.78,Asia, IND,Bharat,1351.sixteen,3287.26,2575.67,Asia,"Baronial 15, 1947" USA,U.s.a.,329.74,9833.52,19485.39,N.America,"July 04, 1776" IDN,Indonesia,268.07,1910.93,1015.54,Asia,"Baronial 17, 1945" BRA,Brazil,210.32,8515.77,2055.51,S.America,"September 07, 1822" PAK,Islamic republic of pakistan,205.71,881.91,302.14,Asia,"August 14, 1947" NGA,Nigeria,200.96,923.77,375.77,Africa,"October 01, 1960" BGD,People's republic of bangladesh,167.09,147.57,245.63,Asia,"March 26, 1971" RUS,Russian federation,146.79,17098.25,1530.75,,"June 12, 1992" MEX,Mexico,126.58,1964.38,1158.23,N.America,"September 16, 1810" JPN,Nippon,126.22,377.97,4872.42,Asia, DEU,Germany,83.02,357.eleven,3693.two,Europe, FRA,French republic,67.02,640.68,2582.49,Europe,"July 14, 1789" GBR,Uk,66.44,242.5,2631.23,Europe, ITA,Italian republic,60.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,Southward.America,"July 09, 1816" DZA,Algeria,43.38,2381.74,167.56,Africa,"July 05, 1962" Can,Canada,37.59,9984.67,1647.12,Due north.America,"July 01, 1867" AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Republic of kazakhstan,18.53,2724.9,159.41,Asia,"December 16, 1991"                              

The format of the dates is different now. The format '%B %d, %Y' ways the date will start display the total name of the calendar month, then the day followed by a comma, and finally the total year.

There are several other optional parameters that you can employ with .to_csv():

  • sep denotes a values separator.
  • decimal indicates a decimal separator.
  • encoding sets the file encoding.
  • header specifies whether you want to write column labels in the file.

Here'due south how you lot would pass arguments for sep and header:

>>>

                                                  >>>                                    southward                  =                  df                  .                  to_csv                  (                  sep                  =                  ';'                  ,                  header                  =                  Faux                  )                  >>>                                    print                  (                  s                  )                  CHN;Prc;1398.72;9596.96;12234.78;Asia;                  IND;Republic of india;1351.16;3287.26;2575.67;Asia;1947-08-15                  USA;The states;329.74;9833.52;19485.39;North.America;1776-07-04                  IDN;Indonesia;268.07;1910.93;1015.54;Asia;1945-08-17                  BRA;Brazil;210.32;8515.77;2055.51;South.America;1822-09-07                  PAK;Pakistan;205.71;881.91;302.xiv;Asia;1947-08-14                  NGA;Nigeria;200.96;923.77;375.77;Africa;1960-10-01                  BGD;Bangladesh;167.09;147.57;245.63;Asia;1971-03-26                  RUS;Russia;146.79;17098.25;1530.75;;1992-06-12                  MEX;Mexico;126.58;1964.38;1158.23;Due north.America;1810-09-16                  JPN;Nihon;126.22;377.97;4872.42;Asia;                  DEU;Deutschland;83.02;357.11;3693.2;Europe;                  FRA;France;67.02;640.68;2582.49;Europe;1789-07-14                  GBR;UK;66.44;242.5;2631.23;Europe;                  ITA;Italy;sixty.36;301.34;1943.84;Europe;                  ARG;Argentina;44.94;2780.4;637.49;S.America;1816-07-09                  DZA;Algeria;43.38;2381.74;167.56;Africa;1962-07-05                  Tin can;Canada;37.59;9984.67;1647.12;N.America;1867-07-01                  AUS;Australia;25.47;7692.02;1408.68;Oceania;                  KAZ;Kazakhstan;18.53;2724.9;159.41;Asia;1991-12-xvi                              

The data is separated with a semicolon (';') because you lot've specified sep=';'. Also, since you passed header=False, you come across your data without the header row of column names.

The Pandas read_csv() function has many additional options for managing missing data, working with dates and times, quoting, encoding, handling errors, and more than. For instance, if you have a file with one data column and want to go a Series object instead of a DataFrame, and then y'all can pass squeeze=True to read_csv(). Yous'll larn later on almost data compression and decompression, as well every bit how to skip rows and columns.

JSON Files

JSON stands for JavaScript object notation. JSON files are plaintext files used for data interchange, and humans can read them easily. They follow the ISO/IEC 21778:2017 and ECMA-404 standards and use the .json extension. Python and Pandas work well with JSON files, as Python's json library offers congenital-in support for them.

You can save the data from your DataFrame to a JSON file with .to_json(). Start past creating a DataFrame object again. Use the dictionary data that holds the data about countries and and so apply .to_json():

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_json                  (                  'information-columns.json'                  )                              

This code produces the file data-columns.json. Y'all tin can expand the lawmaking block beneath to see how this file should expect:

                                                  {                  "Land"                  :{                  "CHN"                  :                  "Red china"                  ,                  "IND"                  :                  "Bharat"                  ,                  "USA"                  :                  "Usa"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Federal republic of germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "UK"                  ,                  "ITA"                  :                  "Italian republic"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "Tin can"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Republic of kazakhstan"                  },                  "POP"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  60.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "CAN"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  xviii.53                  },                  "AREA"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "USA"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.eleven                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.5                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.iv                  ,                  "DZA"                  :                  2381.74                  ,                  "Tin"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.ix                  },                  "GDP"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "USA"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "CAN"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "United states"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "S.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  null                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  zip                  ,                  "IND"                  :                  "1947-08-15"                  ,                  "USA"                  :                  "1776-07-04"                  ,                  "IDN"                  :                  "1945-08-17"                  ,                  "BRA"                  :                  "1822-09-07"                  ,                  "PAK"                  :                  "1947-08-fourteen"                  ,                  "NGA"                  :                  "1960-x-01"                  ,                  "BGD"                  :                  "1971-03-26"                  ,                  "RUS"                  :                  "1992-06-12"                  ,                  "MEX"                  :                  "1810-09-16"                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  "1789-07-14"                  ,                  "GBR"                  :                  cypher                  ,                  "ITA"                  :                  null                  ,                  "ARG"                  :                  "1816-07-09"                  ,                  "DZA"                  :                  "1962-07-05"                  ,                  "CAN"                  :                  "1867-07-01"                  ,                  "AUS"                  :                  cypher                  ,                  "KAZ"                  :                  "1991-12-16"                  }}                              

data-columns.json has one large dictionary with the column labels as keys and the respective inner dictionaries as values.

You lot tin can become a unlike file structure if you pass an argument for the optional parameter orient:

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'data-index.json'                  ,                  orient                  =                  'index'                  )                              

The orient parameter defaults to 'columns'. Here, you've set information technology to index.

You should get a new file data-index.json. You can expand the lawmaking block below to see the changes:

                                                  {                  "CHN"                  :{                  "COUNTRY"                  :                  "China"                  ,                  "POP"                  :                  1398.72                  ,                  "Surface area"                  :                  9596.96                  ,                  "GDP"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  nada                  },                  "IND"                  :{                  "COUNTRY"                  :                  "Bharat"                  ,                  "POP"                  :                  1351.xvi                  ,                  "Surface area"                  :                  3287.26                  ,                  "GDP"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-15"                  },                  "USA"                  :{                  "COUNTRY"                  :                  "US"                  ,                  "POP"                  :                  329.74                  ,                  "AREA"                  :                  9833.52                  ,                  "GDP"                  :                  19485.39                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },                  "IDN"                  :{                  "Country"                  :                  "Indonesia"                  ,                  "POP"                  :                  268.07                  ,                  "Expanse"                  :                  1910.93                  ,                  "Gdp"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },                  "BRA"                  :{                  "COUNTRY"                  :                  "Brazil"                  ,                  "Popular"                  :                  210.32                  ,                  "Area"                  :                  8515.77                  ,                  "GDP"                  :                  2055.51                  ,                  "CONT"                  :                  "Due south.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },                  "PAK"                  :{                  "State"                  :                  "Pakistan"                  ,                  "POP"                  :                  205.71                  ,                  "Area"                  :                  881.91                  ,                  "GDP"                  :                  302.14                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-xiv"                  },                  "NGA"                  :{                  "COUNTRY"                  :                  "Nigeria"                  ,                  "POP"                  :                  200.96                  ,                  "AREA"                  :                  923.77                  ,                  "Gdp"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-x-01"                  },                  "BGD"                  :{                  "Land"                  :                  "People's republic of bangladesh"                  ,                  "Pop"                  :                  167.09                  ,                  "AREA"                  :                  147.57                  ,                  "Gross domestic product"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },                  "RUS"                  :{                  "Land"                  :                  "Russia"                  ,                  "POP"                  :                  146.79                  ,                  "Expanse"                  :                  17098.25                  ,                  "Gross domestic product"                  :                  1530.75                  ,                  "CONT"                  :                  zero                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },                  "MEX"                  :{                  "COUNTRY"                  :                  "Mexico"                  ,                  "POP"                  :                  126.58                  ,                  "AREA"                  :                  1964.38                  ,                  "Gdp"                  :                  1158.23                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1810-09-16"                  },                  "JPN"                  :{                  "State"                  :                  "Nihon"                  ,                  "POP"                  :                  126.22                  ,                  "AREA"                  :                  377.97                  ,                  "Gdp"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  null                  },                  "DEU"                  :{                  "COUNTRY"                  :                  "Germany"                  ,                  "POP"                  :                  83.02                  ,                  "AREA"                  :                  357.11                  ,                  "Gross domestic product"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "FRA"                  :{                  "COUNTRY"                  :                  "French republic"                  ,                  "Pop"                  :                  67.02                  ,                  "Area"                  :                  640.68                  ,                  "Gdp"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-14"                  },                  "GBR"                  :{                  "COUNTRY"                  :                  "UK"                  ,                  "Pop"                  :                  66.44                  ,                  "AREA"                  :                  242.v                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "ITA"                  :{                  "COUNTRY"                  :                  "Italy"                  ,                  "Popular"                  :                  threescore.36                  ,                  "Expanse"                  :                  301.34                  ,                  "GDP"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  goose egg                  },                  "ARG"                  :{                  "COUNTRY"                  :                  "Argentina"                  ,                  "POP"                  :                  44.94                  ,                  "AREA"                  :                  2780.4                  ,                  "Gdp"                  :                  637.49                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },                  "DZA"                  :{                  "State"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "Expanse"                  :                  2381.74                  ,                  "Gdp"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },                  "CAN"                  :{                  "State"                  :                  "Canada"                  ,                  "POP"                  :                  37.59                  ,                  "AREA"                  :                  9984.67                  ,                  "GDP"                  :                  1647.12                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },                  "AUS"                  :{                  "State"                  :                  "Australia"                  ,                  "POP"                  :                  25.47                  ,                  "Surface area"                  :                  7692.02                  ,                  "GDP"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  null                  },                  "KAZ"                  :{                  "State"                  :                  "Kazakhstan"                  ,                  "POP"                  :                  eighteen.53                  ,                  "Area"                  :                  2724.9                  ,                  "Gross domestic product"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-sixteen"                  }}                              

data-index.json besides has i large lexicon, only this time the row labels are the keys, and the inner dictionaries are the values.

At that place are few more options for orient. One of them is 'records':

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'information-records.json'                  ,                  orient                  =                  'records'                  )                              

This code should yield the file data-records.json. Yous tin can expand the lawmaking block below to run into the content:

                                                  [{                  "COUNTRY"                  :                  "China"                  ,                  "POP"                  :                  1398.72                  ,                  "AREA"                  :                  9596.96                  ,                  "GDP"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  nix                  },{                  "COUNTRY"                  :                  "India"                  ,                  "Pop"                  :                  1351.16                  ,                  "AREA"                  :                  3287.26                  ,                  "Gdp"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-fifteen"                  },{                  "Country"                  :                  "US"                  ,                  "Pop"                  :                  329.74                  ,                  "AREA"                  :                  9833.52                  ,                  "Gdp"                  :                  19485.39                  ,                  "CONT"                  :                  "Due north.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },{                  "COUNTRY"                  :                  "Republic of indonesia"                  ,                  "POP"                  :                  268.07                  ,                  "AREA"                  :                  1910.93                  ,                  "Gdp"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },{                  "COUNTRY"                  :                  "Brazil"                  ,                  "Popular"                  :                  210.32                  ,                  "Area"                  :                  8515.77                  ,                  "Gross domestic product"                  :                  2055.51                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },{                  "COUNTRY"                  :                  "Pakistan"                  ,                  "POP"                  :                  205.71                  ,                  "AREA"                  :                  881.91                  ,                  "Gross domestic product"                  :                  302.14                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-fourteen"                  },{                  "Land"                  :                  "Nigeria"                  ,                  "POP"                  :                  200.96                  ,                  "AREA"                  :                  923.77                  ,                  "Gross domestic product"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-ten-01"                  },{                  "Land"                  :                  "Bangladesh"                  ,                  "POP"                  :                  167.09                  ,                  "Surface area"                  :                  147.57                  ,                  "GDP"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },{                  "COUNTRY"                  :                  "Russian federation"                  ,                  "Pop"                  :                  146.79                  ,                  "Area"                  :                  17098.25                  ,                  "Gross domestic product"                  :                  1530.75                  ,                  "CONT"                  :                  nix                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },{                  "COUNTRY"                  :                  "Mexico"                  ,                  "POP"                  :                  126.58                  ,                  "Surface area"                  :                  1964.38                  ,                  "Gross domestic product"                  :                  1158.23                  ,                  "CONT"                  :                  "Northward.America"                  ,                  "IND_DAY"                  :                  "1810-09-16"                  },{                  "COUNTRY"                  :                  "Japan"                  ,                  "POP"                  :                  126.22                  ,                  "AREA"                  :                  377.97                  ,                  "GDP"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  null                  },{                  "Land"                  :                  "Frg"                  ,                  "POP"                  :                  83.02                  ,                  "Surface area"                  :                  357.11                  ,                  "GDP"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },{                  "COUNTRY"                  :                  "France"                  ,                  "POP"                  :                  67.02                  ,                  "Area"                  :                  640.68                  ,                  "GDP"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-14"                  },{                  "COUNTRY"                  :                  "Britain"                  ,                  "POP"                  :                  66.44                  ,                  "AREA"                  :                  242.five                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  goose egg                  },{                  "COUNTRY"                  :                  "Italy"                  ,                  "Popular"                  :                  60.36                  ,                  "AREA"                  :                  301.34                  ,                  "Gross domestic product"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },{                  "State"                  :                  "Argentina"                  ,                  "Popular"                  :                  44.94                  ,                  "Surface area"                  :                  2780.4                  ,                  "Gross domestic product"                  :                  637.49                  ,                  "CONT"                  :                  "South.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },{                  "Country"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "Expanse"                  :                  2381.74                  ,                  "GDP"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },{                  "COUNTRY"                  :                  "Canada"                  ,                  "POP"                  :                  37.59                  ,                  "AREA"                  :                  9984.67                  ,                  "Gdp"                  :                  1647.12                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },{                  "Land"                  :                  "Australia"                  ,                  "Popular"                  :                  25.47                  ,                  "AREA"                  :                  7692.02                  ,                  "Gross domestic product"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  zippo                  },{                  "Land"                  :                  "Kazakhstan"                  ,                  "Pop"                  :                  xviii.53                  ,                  "AREA"                  :                  2724.9                  ,                  "Gdp"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-16"                  }]                              

data-records.json holds a listing with one dictionary for each row. The row labels are not written.

You lot can get some other interesting file construction with orient='divide':

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'data-split.json'                  ,                  orient                  =                  'split'                  )                              

The resulting file is data-split.json. You lot tin can expand the code block below to encounter how this file should look:

                                                  {                  "columns"                  :[                  "Land"                  ,                  "Popular"                  ,                  "AREA"                  ,                  "GDP"                  ,                  "CONT"                  ,                  "IND_DAY"                  ],                  "alphabetize"                  :[                  "CHN"                  ,                  "IND"                  ,                  "Us"                  ,                  "IDN"                  ,                  "BRA"                  ,                  "PAK"                  ,                  "NGA"                  ,                  "BGD"                  ,                  "RUS"                  ,                  "MEX"                  ,                  "JPN"                  ,                  "DEU"                  ,                  "FRA"                  ,                  "GBR"                  ,                  "ITA"                  ,                  "ARG"                  ,                  "DZA"                  ,                  "CAN"                  ,                  "AUS"                  ,                  "KAZ"                  ],                  "data"                  :[[                  "People's republic of china"                  ,                  1398.72                  ,                  9596.96                  ,                  12234.78                  ,                  "Asia"                  ,                  null                  ],[                  "Bharat"                  ,                  1351.16                  ,                  3287.26                  ,                  2575.67                  ,                  "Asia"                  ,                  "1947-08-15"                  ],[                  "United states of america"                  ,                  329.74                  ,                  9833.52                  ,                  19485.39                  ,                  "N.America"                  ,                  "1776-07-04"                  ],[                  "Republic of indonesia"                  ,                  268.07                  ,                  1910.93                  ,                  1015.54                  ,                  "Asia"                  ,                  "1945-08-17"                  ],[                  "Brazil"                  ,                  210.32                  ,                  8515.77                  ,                  2055.51                  ,                  "S.America"                  ,                  "1822-09-07"                  ],[                  "Pakistan"                  ,                  205.71                  ,                  881.91                  ,                  302.xiv                  ,                  "Asia"                  ,                  "1947-08-xiv"                  ],[                  "Nigeria"                  ,                  200.96                  ,                  923.77                  ,                  375.77                  ,                  "Africa"                  ,                  "1960-ten-01"                  ],[                  "Bangladesh"                  ,                  167.09                  ,                  147.57                  ,                  245.63                  ,                  "Asia"                  ,                  "1971-03-26"                  ],[                  "Russian federation"                  ,                  146.79                  ,                  17098.25                  ,                  1530.75                  ,                  null                  ,                  "1992-06-12"                  ],[                  "Mexico"                  ,                  126.58                  ,                  1964.38                  ,                  1158.23                  ,                  "N.America"                  ,                  "1810-09-16"                  ],[                  "Japan"                  ,                  126.22                  ,                  377.97                  ,                  4872.42                  ,                  "Asia"                  ,                  nada                  ],[                  "Germany"                  ,                  83.02                  ,                  357.eleven                  ,                  3693.two                  ,                  "Europe"                  ,                  null                  ],[                  "France"                  ,                  67.02                  ,                  640.68                  ,                  2582.49                  ,                  "Europe"                  ,                  "1789-07-14"                  ],[                  "UK"                  ,                  66.44                  ,                  242.five                  ,                  2631.23                  ,                  "Europe"                  ,                  null                  ],[                  "Italia"                  ,                  60.36                  ,                  301.34                  ,                  1943.84                  ,                  "Europe"                  ,                  zip                  ],[                  "Argentina"                  ,                  44.94                  ,                  2780.iv                  ,                  637.49                  ,                  "S.America"                  ,                  "1816-07-09"                  ],[                  "People's democratic republic of algeria"                  ,                  43.38                  ,                  2381.74                  ,                  167.56                  ,                  "Africa"                  ,                  "1962-07-05"                  ],[                  "Canada"                  ,                  37.59                  ,                  9984.67                  ,                  1647.12                  ,                  "Due north.America"                  ,                  "1867-07-01"                  ],[                  "Australia"                  ,                  25.47                  ,                  7692.02                  ,                  1408.68                  ,                  "Oceania"                  ,                  null                  ],[                  "Kazakhstan"                  ,                  eighteen.53                  ,                  2724.nine                  ,                  159.41                  ,                  "Asia"                  ,                  "1991-12-16"                  ]]}                              

information-split.json contains one dictionary that holds the post-obit lists:

  • The names of the columns
  • The labels of the rows
  • The inner lists (ii-dimensional sequence) that agree data values

If you don't provide the value for the optional parameter path_or_buf that defines the file path, then .to_json() will return a JSON string instead of writing the results to a file. This behavior is consistent with .to_csv().

There are other optional parameters you tin use. For instance, y'all tin can set alphabetize=False to forgo saving row labels. You can manipulate precision with double_precision, and dates with date_format and date_unit. These last two parameters are particularly important when you have fourth dimension series among your data:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  State            object                  POP                object                  Surface area               object                  Gdp                object                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  to_json                  (                  'data-time.json'                  )                              

In this example, yous've created the DataFrame from the dictionary data and used to_datetime() to convert the values in the last column to datetime64. You can expand the lawmaking block below to see the resulting file:

                                                  {                  "Land"                  :{                  "CHN"                  :                  "China"                  ,                  "IND"                  :                  "Bharat"                  ,                  "Us"                  :                  "United states"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Islamic republic of pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russian federation"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "Uk"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "People's democratic republic of algeria"                  ,                  "Can"                  :                  "Canada"                  ,                  "AUS"                  :                  "Commonwealth of australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "Popular"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  60.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Tin can"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  eighteen.53                  },                  "Area"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "USA"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.5                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.iv                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "GDP"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states of america"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Can"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "S.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  null                  ,                  "MEX"                  :                  "North.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "Tin can"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  null                  ,                  "IND"                  :                  -706320000000                  ,                  "Us"                  :                  -6106060800000                  ,                  "IDN"                  :                  -769219200000                  ,                  "BRA"                  :                  -4648924800000                  ,                  "PAK"                  :                  -706406400000                  ,                  "NGA"                  :                  -291945600000                  ,                  "BGD"                  :                  38793600000                  ,                  "RUS"                  :                  708307200000                  ,                  "MEX"                  :                  -5026838400000                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  -5694969600000                  ,                  "GBR"                  :                  null                  ,                  "ITA"                  :                  cipher                  ,                  "ARG"                  :                  -4843411200000                  ,                  "DZA"                  :                  -236476800000                  ,                  "CAN"                  :                  -3234729600000                  ,                  "AUS"                  :                  null                  ,                  "KAZ"                  :                  692841600000                  }}                              

In this file, yous have large integers instead of dates for the independence days. That's because the default value of the optional parameter date_format is 'epoch' whenever orient isn't 'table'. This default behavior expresses dates every bit an epoch in milliseconds relative to midnight on January 1, 1970.

However, if you laissez passer date_format='iso', then you'll go the dates in the ISO 8601 format. In addition, date_unit decides the units of time:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_json                  (                  'new-data-fourth dimension.json'                  ,                  date_format                  =                  'iso'                  ,                  date_unit                  =                  's'                  )                              

This code produces the following JSON file:

                                                  {                  "State"                  :{                  "CHN"                  :                  "China"                  ,                  "IND"                  :                  "India"                  ,                  "U.s.a."                  :                  "US"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Deutschland"                  ,                  "FRA"                  :                  "French republic"                  ,                  "GBR"                  :                  "Great britain"                  ,                  "ITA"                  :                  "Italia"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "CAN"                  :                  "Canada"                  ,                  "AUS"                  :                  "Commonwealth of australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "Pop"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "Us"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  60.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Tin can"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  eighteen.53                  },                  "Area"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "USA"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.5                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.4                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "Gross domestic product"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Tin"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "The states"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "S.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  nada                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  cypher                  ,                  "IND"                  :                  "1947-08-15T00:00:00Z"                  ,                  "United states"                  :                  "1776-07-04T00:00:00Z"                  ,                  "IDN"                  :                  "1945-08-17T00:00:00Z"                  ,                  "BRA"                  :                  "1822-09-07T00:00:00Z"                  ,                  "PAK"                  :                  "1947-08-14T00:00:00Z"                  ,                  "NGA"                  :                  "1960-x-01T00:00:00Z"                  ,                  "BGD"                  :                  "1971-03-26T00:00:00Z"                  ,                  "RUS"                  :                  "1992-06-12T00:00:00Z"                  ,                  "MEX"                  :                  "1810-09-16T00:00:00Z"                  ,                  "JPN"                  :                  nil                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  "1789-07-14T00:00:00Z"                  ,                  "GBR"                  :                  nothing                  ,                  "ITA"                  :                  null                  ,                  "ARG"                  :                  "1816-07-09T00:00:00Z"                  ,                  "DZA"                  :                  "1962-07-05T00:00:00Z"                  ,                  "CAN"                  :                  "1867-07-01T00:00:00Z"                  ,                  "AUS"                  :                  null                  ,                  "KAZ"                  :                  "1991-12-16T00:00:00Z"                  }}                              

The dates in the resulting file are in the ISO 8601 format.

You can load the data from a JSON file with read_json():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_json                  (                  'data-index.json'                  ,                  orient                  =                  'index'                  ,                  ...                                    convert_dates                  =                  [                  'IND_DAY'                  ])                              

The parameter convert_dates has a like purpose as parse_dates when you lot use information technology to read CSV files. The optional parameter orient is very important because it specifies how Pandas understands the structure of the file.

There are other optional parameters you lot can employ as well:

  • Set the encoding with encoding.
  • Manipulate dates with convert_dates and keep_default_dates.
  • Impact precision with dtype and precise_float.
  • Decode numeric data directly to NumPy arrays with numpy=True.

Note that you might lose the social club of rows and columns when using the JSON format to shop your data.

HTML Files

An HTML is a plaintext file that uses hypertext markup language to assist browsers return web pages. The extensions for HTML files are .html and .htm. You'll need to install an HTML parser library like lxml or html5lib to exist able to work with HTML files:

                                                  $pip install lxml html5lib                              

You lot can too employ Conda to install the same packages:

                                                  $                  conda install lxml html5lib                              

One time you have these libraries, you can save the contents of your DataFrame as an HTML file with .to_html():

>>>

                                                  df = pd.DataFrame(data=data).T                  df.to_html('data.html')                              

This code generates a file information.html. Yous can expand the code block below to see how this file should look:

                                                  <                  tabular array                  border                  =                  "ane"                  grade                  =                  "dataframe"                  >                  <                  thead                  >                  <                  tr                  manner                  =                  "text-align: correct;"                  >                  <                  th                  ></                  thursday                  >                  <                  th                  >Land</                  thursday                  >                  <                  th                  >Popular</                  thursday                  >                  <                  thursday                  >Area</                  th                  >                  <                  th                  >Gross domestic product</                  th                  >                  <                  th                  >CONT</                  th                  >                  <                  th                  >IND_DAY</                  th                  >                  </                  tr                  >                  </                  thead                  >                  <                  tbody                  >                  <                  tr                  >                  <                  thursday                  >CHN</                  th                  >                  <                  td                  >Red china</                  td                  >                  <                  td                  >1398.72</                  td                  >                  <                  td                  >9596.96</                  td                  >                  <                  td                  >12234.8</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IND</                  th                  >                  <                  td                  >Republic of india</                  td                  >                  <                  td                  >1351.sixteen</                  td                  >                  <                  td                  >3287.26</                  td                  >                  <                  td                  >2575.67</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-xv</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >Usa</                  thursday                  >                  <                  td                  >US</                  td                  >                  <                  td                  >329.74</                  td                  >                  <                  td                  >9833.52</                  td                  >                  <                  td                  >19485.4</                  td                  >                  <                  td                  >Due north.America</                  td                  >                  <                  td                  >1776-07-04</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IDN</                  th                  >                  <                  td                  >Indonesia</                  td                  >                  <                  td                  >268.07</                  td                  >                  <                  td                  >1910.93</                  td                  >                  <                  td                  >1015.54</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1945-08-17</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >BRA</                  th                  >                  <                  td                  >Brazil</                  td                  >                  <                  td                  >210.32</                  td                  >                  <                  td                  >8515.77</                  td                  >                  <                  td                  >2055.51</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1822-09-07</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >PAK</                  th                  >                  <                  td                  >Pakistan</                  td                  >                  <                  td                  >205.71</                  td                  >                  <                  td                  >881.91</                  td                  >                  <                  td                  >302.14</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-14</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >NGA</                  th                  >                  <                  td                  >Nigeria</                  td                  >                  <                  td                  >200.96</                  td                  >                  <                  td                  >923.77</                  td                  >                  <                  td                  >375.77</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1960-10-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >BGD</                  thursday                  >                  <                  td                  >Bangladesh</                  td                  >                  <                  td                  >167.09</                  td                  >                  <                  td                  >147.57</                  td                  >                  <                  td                  >245.63</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1971-03-26</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >RUS</                  th                  >                  <                  td                  >Russia</                  td                  >                  <                  td                  >146.79</                  td                  >                  <                  td                  >17098.2</                  td                  >                  <                  td                  >1530.75</                  td                  >                  <                  td                  >NaN</                  td                  >                  <                  td                  >1992-06-12</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >MEX</                  thursday                  >                  <                  td                  >Mexico</                  td                  >                  <                  td                  >126.58</                  td                  >                  <                  td                  >1964.38</                  td                  >                  <                  td                  >1158.23</                  td                  >                  <                  td                  >N.America</                  td                  >                  <                  td                  >1810-09-16</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >JPN</                  thursday                  >                  <                  td                  >Japan</                  td                  >                  <                  td                  >126.22</                  td                  >                  <                  td                  >377.97</                  td                  >                  <                  td                  >4872.42</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >DEU</                  th                  >                  <                  td                  >Germany</                  td                  >                  <                  td                  >83.02</                  td                  >                  <                  td                  >357.xi</                  td                  >                  <                  td                  >3693.2</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >FRA</                  th                  >                  <                  td                  >France</                  td                  >                  <                  td                  >67.02</                  td                  >                  <                  td                  >640.68</                  td                  >                  <                  td                  >2582.49</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >1789-07-14</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >GBR</                  th                  >                  <                  td                  >United kingdom</                  td                  >                  <                  td                  >66.44</                  td                  >                  <                  td                  >242.5</                  td                  >                  <                  td                  >2631.23</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >ITA</                  thursday                  >                  <                  td                  >Italy</                  td                  >                  <                  td                  >60.36</                  td                  >                  <                  td                  >301.34</                  td                  >                  <                  td                  >1943.84</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >ARG</                  th                  >                  <                  td                  >Argentina</                  td                  >                  <                  td                  >44.94</                  td                  >                  <                  td                  >2780.4</                  td                  >                  <                  td                  >637.49</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1816-07-09</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >DZA</                  th                  >                  <                  td                  >Algeria</                  td                  >                  <                  td                  >43.38</                  td                  >                  <                  td                  >2381.74</                  td                  >                  <                  td                  >167.56</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1962-07-05</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >CAN</                  thursday                  >                  <                  td                  >Canada</                  td                  >                  <                  td                  >37.59</                  td                  >                  <                  td                  >9984.67</                  td                  >                  <                  td                  >1647.12</                  td                  >                  <                  td                  >Northward.America</                  td                  >                  <                  td                  >1867-07-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >AUS</                  thursday                  >                  <                  td                  >Commonwealth of australia</                  td                  >                  <                  td                  >25.47</                  td                  >                  <                  td                  >7692.02</                  td                  >                  <                  td                  >1408.68</                  td                  >                  <                  td                  >Oceania</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >KAZ</                  th                  >                  <                  td                  >Republic of kazakhstan</                  td                  >                  <                  td                  >18.53</                  td                  >                  <                  td                  >2724.9</                  td                  >                  <                  td                  >159.41</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1991-12-16</                  td                  >                  </                  tr                  >                  </                  tbody                  >                  </                  table                  >                              

This file shows the DataFrame contents nicely. Nonetheless, discover that you haven't obtained an entire web page. You've simply output the information that corresponds to df in the HTML format.

.to_html() won't create a file if you don't provide the optional parameter buf, which denotes the buffer to write to. If you go out this parameter out, then your code will return a string as it did with .to_csv() and .to_json().

Hither are another optional parameters:

  • header determines whether to salvage the column names.
  • alphabetize determines whether to salvage the row labels.
  • classes assigns cascading style canvass (CSS) classes.
  • render_links specifies whether to convert URLs to HTML links.
  • table_id assigns the CSS id to the tabular array tag.
  • escape decides whether to convert the characters <, >, and & to HTML-safe strings.

You utilise parameters like these to specify different aspects of the resulting files or strings.

Yous can create a DataFrame object from a suitable HTML file using read_html(), which will return a DataFrame instance or a list of them:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_html                  (                  'data.html'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                              

This is very similar to what you did when reading CSV files. You also have parameters that help yous work with dates, missing values, precision, encoding, HTML parsers, and more.

Excel Files

Yous've already learned how to read and write Excel files with Pandas. However, there are a few more options worth considering. For one, when you use .to_excel(), you can specify the proper noun of the target worksheet with the optional parameter sheet_name:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  information                  )                  .                  T                  >>>                                    df                  .                  to_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  )                              

Here, you create a file information.xlsx with a worksheet called COUNTRIES that stores the data. The string 'data.xlsx' is the statement for the parameter excel_writer that defines the name of the Excel file or its path.

The optional parameters startrow and startcol both default to 0 and indicate the upper left-most cell where the data should first beingness written:

>>>

                                                  >>>                                    df                  .                  to_excel                  (                  'data-shifted.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  ...                                    startrow                  =                  2                  ,                  startcol                  =                  4                  )                              

Hither, you lot specify that the table should start in the third row and the fifth cavalcade. You also used zero-based indexing, and then the third row is denoted past ii and the fifth column by four.

Now the resulting worksheet looks like this:

mmst-pandas-rw-files-excel-shifted

As yous can see, the table starts in the third row ii and the fifth column East.

.read_excel() as well has the optional parameter sheet_name that specifies which worksheets to read when loading data. Information technology tin can take on one of the following values:

  • The zippo-based index of the worksheet
  • The name of the worksheet
  • The listing of indices or names to read multiple sheets
  • The value None to read all sheets

Here's how you would utilize this parameter in your code:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  0                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'data.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                              

Both statements above create the same DataFrame because the sheet_name parameters take the same values. In both cases, sheet_name=0 and sheet_name='COUNTRIES' refer to the same worksheet. The statement parse_dates=['IND_DAY'] tells Pandas to try to consider the values in this cavalcade equally dates or times.

There are other optional parameters yous can apply with .read_excel() and .to_excel() to determine the Excel engine, the encoding, the style to handle missing values and infinities, the method for writing column names and row labels, and so on.

SQL Files

Pandas IO tools can too read and write databases. In this next example, you lot'll write your data to a database called data.db. To become started, y'all'll demand the SQLAlchemy package. To learn more most it, you can read the official ORM tutorial. You'll also need the database driver. Python has a built-in driver for SQLite.

You can install SQLAlchemy with pip:

You can besides install it with Conda:

                                                  $                  conda install sqlalchemy                              

Once you take SQLAlchemy installed, import create_engine() and create a database engine:

>>>

                                                  >>>                                    from                  sqlalchemy                  import                  create_engine                  >>>                                    engine                  =                  create_engine                  (                  'sqlite:///data.db'                  ,                  echo                  =                  Faux                  )                              

Now that you have everything fix, the next pace is to create a DataFrame object. It's convenient to specify the data types and apply .to_sql().

>>>

                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float64'                  ,                  'Expanse'                  :                  'float64'                  ,                  'Gross domestic product'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  dtypes                  State            object                  Popular               float64                  AREA              float64                  Gross domestic product               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

.astype() is a very convenient method you can apply to set multiple data types at in one case.

Once you've created your DataFrame, you tin save it to the database with .to_sql():

>>>

                                                  >>>                                    df                  .                  to_sql                  (                  'data.db'                  ,                  con                  =                  engine                  ,                  index_label                  =                  'ID'                  )                              

The parameter con is used to specify the database connection or engine that you lot want to use. The optional parameter index_label specifies how to phone call the database column with the row labels. You lot'll oftentimes run into it take on the value ID, Id, or id.

You should go the database data.db with a single tabular array that looks like this:

mmst-pandas-rw-files-db

The get-go column contains the row labels. To omit writing them into the database, laissez passer index=Faux to .to_sql(). The other columns correspond to the columns of the DataFrame.

In that location are a few more optional parameters. For example, you lot can use schema to specify the database schema and dtype to determine the types of the database columns. You lot can likewise apply if_exists, which says what to do if a database with the aforementioned name and path already exists:

  • if_exists='neglect' raises a ValueError and is the default.
  • if_exists='replace' drops the tabular array and inserts new values.
  • if_exists='append' inserts new values into the table.

You can load the data from the database with read_sql():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'data.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  )                  >>>                                    df                                      Land      POP      AREA       Gdp       CONT    IND_DAY                  ID                  CHN       People's republic of china  1398.72   9596.96  12234.78       Asia        NaT                  IND       Bharat  1351.sixteen   3287.26   2575.67       Asia 1947-08-fifteen                  USA          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Nippon   126.22    377.97   4872.42       Asia        NaT                  DEU     Frg    83.02    357.11   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          United kingdom of great britain and northern ireland    66.44    242.fifty   2631.23     Europe        NaT                  ITA       Italy    threescore.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.forty    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-xvi                              

The parameter index_col specifies the name of the cavalcade with the row labels. Note that this inserts an extra row after the header that starts with ID. Yous can ready this beliefs with the post-obit line of code:

>>>

                                                  >>>                                    df                  .                  index                  .                  proper name                  =                  None                  >>>                                    df                                      Country      POP      Expanse       GDP       CONT    IND_DAY                  CHN       Cathay  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.16   3287.26   2575.67       Asia 1947-08-15                  U.s.          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Southward.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  Due north.America 1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.11   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          U.k.    66.44    242.fifty   2631.23     Europe        NaT                  ITA       Italian republic    lx.36    301.34   1943.84     Europe        NaT                  ARG   Argentine republic    44.94   2780.40    637.49  South.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  Tin can      Canada    37.59   9984.67   1647.12  N.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    xviii.53   2724.90    159.41       Asia 1991-12-xvi                              

Now you have the same DataFrame object every bit before.

Note that the continent for Russia is now None instead of nan. If you desire to make full the missing values with nan, then you can employ .fillna():

>>>

                                                  >>>                                    df                  .                  fillna                  (                  value                  =                  float                  (                  'nan'                  ),                  inplace                  =                  True                  )                              

.fillna() replaces all missing values with whatever you pass to value. Hither, you passed bladder('nan'), which says to fill up all missing values with nan.

Also annotation that you lot didn't have to pass parse_dates=['IND_DAY'] to read_sql(). That's because your database was able to observe that the concluding column contains dates. Withal, you can pass parse_dates if you'd like. You'll get the same results.

At that place are other functions that you can employ to read databases, like read_sql_table() and read_sql_query(). Feel costless to endeavor them out!

Pickle Files

Pickling is the act of converting Python objects into byte streams. Unpickling is the inverse procedure. Python pickle files are the binary files that keep the data and hierarchy of Python objects. They usually have the extension .pickle or .pkl.

You tin save your DataFrame in a pickle file with .to_pickle():

>>>

                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float64'                  ,                  'AREA'                  :                  'float64'                  ,                  'GDP'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle'                  )                              

Similar you did with databases, it tin exist convenient first to specify the information types. Then, you create a file information.pickle to contain your data. Yous could also pass an integer value to the optional parameter protocol, which specifies the protocol of the pickler.

You can go the data from a pickle file with read_pickle():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'information.pickle'                  )                  >>>                                    df                                      COUNTRY      Popular      AREA       GDP       CONT    IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia        NaT                  IND       Bharat  1351.xvi   3287.26   2575.67       Asia 1947-08-fifteen                  Usa          Us   329.74   9833.52  19485.39  Northward.America 1776-07-04                  IDN   Republic of indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Islamic republic of pakistan   205.71    881.91    302.14       Asia 1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  North.America 1810-09-16                  JPN       Nippon   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.11   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-xiv                  GBR          UK    66.44    242.50   2631.23     Europe        NaT                  ITA       Italy    threescore.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.forty    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  N.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Republic of kazakhstan    18.53   2724.xc    159.41       Asia 1991-12-16                              

read_pickle() returns the DataFrame with the stored data. Yous can also check the information types:

>>>

                                                  >>>                                    df                  .                  dtypes                  Country            object                  POP               float64                  Surface area              float64                  Gdp               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

These are the same ones that you specified before using .to_pickle().

As a give-and-take of circumspection, you should always beware of loading pickles from untrusted sources. This can be unsafe! When you unpickle an untrustworthy file, it could execute arbitrary code on your machine, proceeds remote access to your calculator, or otherwise exploit your device in other ways.

Working With Big Data

If your files are too big for saving or processing, then at that place are several approaches you tin accept to reduce the required disk space:

  • Compress your files
  • Choose merely the columns yous want
  • Omit the rows y'all don't need
  • Force the use of less precise information types
  • Split the data into chunks

You lot'll take a wait at each of these techniques in turn.

Shrink and Decompress Files

You tin create an archive file like you would a regular ane, with the improver of a suffix that corresponds to the desired compression type:

  • '.gz'
  • '.bz2'
  • '.zip'
  • '.xz'

Pandas tin can deduce the compression blazon by itself:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  information                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_csv                  (                  'data.csv.zip'                  )                              

Hither, you create a compressed .csv file as an archive. The size of the regular .csv file is 1048 bytes, while the compressed file only has 766 bytes.

You tin open up this compressed file as usual with the Pandas read_csv() function:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv.zip'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                                      COUNTRY      POP      Surface area       GDP       CONT    IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia        NaT                  IND       Republic of india  1351.16   3287.26   2575.67       Asia 1947-08-fifteen                  USA          US   329.74   9833.52  19485.39  North.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.xiv       Asia 1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-x-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-sixteen                  JPN       Nihon   126.22    377.97   4872.42       Asia        NaT                  DEU     Frg    83.02    357.11   3693.twenty     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          UK    66.44    242.fifty   2631.23     Europe        NaT                  ITA       Italy    60.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-16                              

read_csv() decompresses the file before reading it into a DataFrame.

You can specify the blazon of compression with the optional parameter pinch, which can take on any of the post-obit values:

  • 'infer'
  • 'gzip'
  • 'bz2'
  • 'zip'
  • 'xz'
  • None

The default value compression='infer' indicates that Pandas should deduce the compression type from the file extension.

Here's how you would shrink a pickle file:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle.shrink'                  ,                  compression                  =                  'gzip'                  )                              

You lot should get the file information.pickle.compress that you can subsequently decompress and read:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'data.pickle.compress'                  ,                  compression                  =                  'gzip'                  )                              

df again corresponds to the DataFrame with the same information as before.

Yous can give the other compression methods a try, as well. If you're using pickle files, then keep in mind that the .zip format supports reading only.

Choose Columns

The Pandas read_csv() and read_excel() functions have the optional parameter usecols that you can utilise to specify the columns you want to load from the file. You can laissez passer the list of cavalcade names as the corresponding argument:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'information.csv'                  ,                  usecols                  =                  [                  'COUNTRY'                  ,                  'AREA'                  ])                  >>>                                    df                                      COUNTRY      Expanse                  0        China   9596.96                  1        India   3287.26                  2           Usa   9833.52                  3    Republic of indonesia   1910.93                  four       Brazil   8515.77                  5     Pakistan    881.91                  6      Nigeria    923.77                  7   People's republic of bangladesh    147.57                  eight       Russia  17098.25                  nine       Mexico   1964.38                  10       Japan    377.97                  11     Federal republic of germany    357.11                  12      France    640.68                  13          UK    242.50                  14       Italia    301.34                  xv   Argentina   2780.40                  xvi     People's democratic republic of algeria   2381.74                  17      Canada   9984.67                  18   Australia   7692.02                  nineteen  Kazakhstan   2724.90                              

Now yous take a DataFrame that contains less data than before. Hither, there are only the names of the countries and their areas.

Instead of the column names, you can also laissez passer their indices:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'information.csv'                  ,                  index_col                  =                  0                  ,                  usecols                  =                  [                  0                  ,                  1                  ,                  iii                  ])                  >>>                                    df                                      State      AREA                  CHN       China   9596.96                  IND       India   3287.26                  USA          US   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  People's republic of bangladesh    147.57                  RUS      Russia  17098.25                  MEX      Mexico   1964.38                  JPN       Japan    377.97                  DEU     Frg    357.eleven                  FRA      France    640.68                  GBR          United kingdom of great britain and northern ireland    242.50                  ITA       Italy    301.34                  ARG   Argentina   2780.40                  DZA     Algeria   2381.74                  Tin      Canada   9984.67                  AUS   Australia   7692.02                  KAZ  Republic of kazakhstan   2724.90                              

Expand the code cake below to compare these results with the file 'data.csv':

                                ,COUNTRY,POP,AREA,Gdp,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia, IND,India,1351.16,3287.26,2575.67,Asia,1947-08-15 USA,US,329.74,9833.52,19485.39,N.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,Southward.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.xiv,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16 JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Deutschland,83.02,357.11,3693.2,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,United kingdom,66.44,242.five,2631.23,Europe, ITA,Italy,sixty.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.iv,637.49,South.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Republic of kazakhstan,eighteen.53,2724.9,159.41,Asia,1991-12-16                              

You lot can run across the post-obit columns:

  • The cavalcade at index 0 contains the row labels.
  • The column at alphabetize i contains the country names.
  • The column at alphabetize 3 contains the areas.

Simlarly, read_sql() has the optional parameter columns that takes a list of column names to read:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'data.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  ,                  ...                                    columns                  =                  [                  'COUNTRY'                  ,                  'Expanse'                  ])                  >>>                                    df                  .                  index                  .                  proper noun                  =                  None                  >>>                                    df                                      COUNTRY      AREA                  CHN       Prc   9596.96                  IND       India   3287.26                  USA          U.s.   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  People's republic of bangladesh    147.57                  RUS      Russia  17098.25                  MEX      Mexico   1964.38                  JPN       Japan    377.97                  DEU     Germany    357.xi                  FRA      France    640.68                  GBR          UK    242.50                  ITA       Italy    301.34                  ARG   Argentina   2780.twoscore                  DZA     Algeria   2381.74                  Can      Canada   9984.67                  AUS   Australia   7692.02                  KAZ  Kazakhstan   2724.ninety                              

Again, the DataFrame just contains the columns with the names of the countries and areas. If columns is None or omitted, and so all of the columns volition be read, as yous saw before. The default behavior is columns=None.

Omit Rows

When you test an algorithm for data processing or auto learning, you ofttimes don't need the entire dataset. Information technology's user-friendly to load merely a subset of the data to speed up the procedure. The Pandas read_csv() and read_excel() functions have some optional parameters that permit you to select which rows you want to load:

  • skiprows: either the number of rows to skip at the beginning of the file if it's an integer, or the zero-based indices of the rows to skip if it'south a list-similar object
  • skipfooter: the number of rows to skip at the cease of the file
  • nrows: the number of rows to read

Here's how yous would skip rows with odd cipher-based indices, keeping the even ones:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  skiprows                  =                  range                  (                  i                  ,                  20                  ,                  2                  ))                  >>>                                    df                                      Land      POP     Surface area      GDP       CONT     IND_DAY                  IND       Republic of india  1351.xvi  3287.26  2575.67       Asia  1947-08-15                  IDN   Indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                  PAK    Pakistan   205.71   881.91   302.14       Asia  1947-08-14                  BGD  Bangladesh   167.09   147.57   245.63       Asia  1971-03-26                  MEX      Mexico   126.58  1964.38  1158.23  Northward.America  1810-09-16                  DEU     Germany    83.02   357.eleven  3693.twenty     Europe         NaN                  GBR          UK    66.44   242.50  2631.23     Europe         NaN                  ARG   Argentina    44.94  2780.40   637.49  S.America  1816-07-09                  CAN      Canada    37.59  9984.67  1647.12  N.America  1867-07-01                  KAZ  Kazakhstan    xviii.53  2724.90   159.41       Asia  1991-12-sixteen                              

In this example, skiprows is range(i, twenty, 2) and corresponds to the values 1, 3, …, xix. The instances of the Python built-in course range acquit similar sequences. The first row of the file data.csv is the header row. It has the index 0, and so Pandas loads it in. The second row with index i corresponds to the characterization CHN, and Pandas skips information technology. The third row with the index ii and label IND is loaded, and and so on.

If you desire to choose rows randomly, and so skiprows can be a listing or NumPy array with pseudo-random numbers, obtained either with pure Python or with NumPy.

Strength Less Precise Data Types

If you're okay with less precise data types, and then you can potentially salve a meaning amount of memory! First, go the information types with .dtypes again:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  Pop               float64                  Expanse              float64                  GDP               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

The columns with the floating-signal numbers are 64-bit floats. Each number of this type float64 consumes 64 bits or 8 bytes. Each cavalcade has 20 numbers and requires 160 bytes. You tin can verify this with .memory_usage():

>>>

                                                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  COUNTRY    160                  Pop        160                  AREA       160                  GDP        160                  CONT       160                  IND_DAY    160                  dtype: int64                              

.memory_usage() returns an case of Series with the memory usage of each column in bytes. You tin can conveniently combine information technology with .loc[] and .sum() to get the memory for a group of columns:

>>>

                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Popular'                  ,                  'AREA'                  ,                  'GDP'                  ]]                  .                  memory_usage                  (                  index                  =                  Fake                  )                  .                  sum                  ()                  480                              

This example shows how y'all can combine the numeric columns 'Popular', 'AREA', and 'Gross domestic product' to get their total memory requirement. The statement index=False excludes data for row labels from the resulting Series object. For these three columns, yous'll demand 480 bytes.

You can also extract the information values in the form of a NumPy array with .to_numpy() or .values. Then, utilise the .nbytes attribute to go the full bytes consumed past the items of the array:

>>>

                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Pop'                  ,                  'AREA'                  ,                  'GDP'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  480                              

The result is the same 480 bytes. So, how do you save memory?

In this case, yous can specify that your numeric columns 'POP', 'Expanse', and 'GDP' should take the type float32. Use the optional parameter dtype to do this:

>>>

                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float32'                  ,                  'Surface area'                  :                  'float32'                  ,                  'Gdp'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                              

The dictionary dtypes specifies the desired data types for each column. It's passed to the Pandas read_csv() office as the argument that corresponds to the parameter dtype.

Now you can verify that each numeric column needs 80 bytes, or 4 bytes per item:

>>>

                                                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float32                  Expanse              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  COUNTRY    160                  POP         80                  Expanse        fourscore                  Gross domestic product         80                  CONT       160                  IND_DAY    160                  dtype: int64                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'AREA'                  ,                  'Gdp'                  ]]                  .                  memory_usage                  (                  index                  =                  False                  )                  .                  sum                  ()                  240                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'Area'                  ,                  'GDP'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  240                              

Each value is a floating-point number of 32 bits or iv bytes. The iii numeric columns comprise 20 items each. In total, you'll need 240 bytes of retention when yous piece of work with the type float32. This is half the size of the 480 bytes you lot'd demand to work with float64.

In improver to saving memory, you can significantly reduce the time required to process data by using float32 instead of float64 in some cases.

Utilize Chunks to Iterate Through Files

Another way to deal with very large datasets is to split the information into smaller chunks and process one clamper at a time. If you use read_csv(), read_json() or read_sql(), then you tin can specify the optional parameter chunksize:

>>>

                                                  >>>                                    data_chunk                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  )                  >>>                                    type                  (                  data_chunk                  )                  <class 'pandas.io.parsers.TextFileReader'>                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__iter__'                  )                  True                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__next__'                  )                  Truthful                              

chunksize defaults to None and tin can take on an integer value that indicates the number of items in a single chunk. When chunksize is an integer, read_csv() returns an iterable that y'all tin use in a for loop to get and procedure only a fragment of the dataset in each iteration:

>>>

                                                  >>>                                    for                  df_chunk                  in                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  ):                  ...                                    impress                  (                  df_chunk                  ,                  finish                  =                  '                  \n\n                  '                  )                  ...                                    print                  (                  'retentivity:'                  ,                  df_chunk                  .                  memory_usage                  ()                  .                  sum                  (),                  'bytes'                  ,                  ...                                    end                  =                  '                  \n\north\n                  '                  )                  ...                                      COUNTRY      POP     Surface area       GDP       CONT     IND_DAY                  CHN       Prc  1398.72  9596.96  12234.78       Asia         NaN                  IND       Bharat  1351.16  3287.26   2575.67       Asia  1947-08-15                  United states          Usa   329.74  9833.52  19485.39  N.America  1776-07-04                  IDN   Republic of indonesia   268.07  1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32  8515.77   2055.51  S.America  1822-09-07                  PAK    Pakistan   205.71   881.91    302.14       Asia  1947-08-fourteen                  NGA     Nigeria   200.96   923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09   147.57    245.63       Asia  1971-03-26                  memory: 448 bytes                                      Land     Popular      AREA      GDP       CONT     IND_DAY                  RUS     Russia  146.79  17098.25  1530.75        NaN  1992-06-12                  MEX     United mexican states  126.58   1964.38  1158.23  N.America  1810-09-xvi                  JPN      Japan  126.22    377.97  4872.42       Asia         NaN                  DEU    Germany   83.02    357.xi  3693.20     Europe         NaN                  FRA     France   67.02    640.68  2582.49     Europe  1789-07-14                  GBR         UK   66.44    242.fifty  2631.23     Europe         NaN                  ITA      Italia   sixty.36    301.34  1943.84     Europe         NaN                  ARG  Argentina   44.94   2780.40   637.49  South.America  1816-07-09                  memory: 448 bytes                                      State    POP     Area      Gdp       CONT     IND_DAY                  DZA     Algeria  43.38  2381.74   167.56     Africa  1962-07-05                  CAN      Canada  37.59  9984.67  1647.12  N.America  1867-07-01                  AUS   Australia  25.47  7692.02  1408.68    Oceania         NaN                  KAZ  Kazakhstan  18.53  2724.xc   159.41       Asia  1991-12-sixteen                  retentivity: 224 bytes                              

In this example, the chunksize is 8. The get-go iteration of the for loop returns a DataFrame with the first eight rows of the dataset only. The second iteration returns some other DataFrame with the next 8 rows. The third and last iteration returns the remaining 4 rows.

In each iteration, you get and process the DataFrame with the number of rows equal to chunksize. It's possible to have fewer rows than the value of chunksize in the final iteration. Y'all can utilise this functionality to control the amount of memory required to procedure information and keep that amount reasonably minor.

Conclusion

You at present know how to relieve the data and labels from Pandas DataFrame objects to different kinds of files. Y'all also know how to load your data from files and create DataFrame objects.

You lot've used the Pandas read_csv() and .to_csv() methods to read and write CSV files. You too used similar methods to read and write Excel, JSON, HTML, SQL, and pickle files. These functions are very user-friendly and widely used. They allow you lot to salve or load your data in a single function or method call.

You've also learned how to save fourth dimension, memory, and disk infinite when working with large information files:

  • Compress or decompress files
  • Choose the rows and columns you want to load
  • Use less precise data types
  • Split up data into chunks and procedure them i by ane

You lot've mastered a significant step in the car learning and data scientific discipline process! If you have any questions or comments, then please put them in the comments section below.

Spotter Now This tutorial has a related video course created past the Existent Python team. Watch it together with the written tutorial to deepen your understanding: Reading and Writing Files With Pandas