Python Data Science Tutorial on Python Processing JSON Data

json file stores data as text in human-readable format. json stands for javascript object notation. pandas can read json files using the read_json function.

input data

create a json file by copying the below data into a text editor like notepad. save the file with .json extension and choosing the file type as all files(*.*).

{ 
   "id":["1","2","3","4","5","6","7","8" ],
   "name":["rick","dan","michelle","ryan","gary","nina","simon","guru" ]
   "salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],
   
   "startdate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
      "7/30/2013","6/17/2014"],
   "dept":[ "it","operations","it","hr","finance","it","operations","finance"]
}

read the json file

the read_json function of the pandas library can be used to read the json file into a pandas dataframe.

import pandas as pd

data = pd.read_json('path/input.json')
print (data)

when we execute the above code, it produces the following result.

         dept  id    name  salary   startdate
0          it   1    rick  623.30    1/1/2012
1  operations   2     dan  515.20   9/23/2013
2          it   3   tusar  611.00  11/15/2014
3          hr   4    ryan  729.00   5/11/2014
4     finance   5    gary  843.25   3/27/2015
5          it   6   rasmi  578.00   5/21/2013
6  operations   7  pranab  632.80   7/30/2013
7     finance   8    guru  722.50   6/17/2014

reading specific columns and rows

similar to what we have already seen in the previous chapter to read the csv file, the read_json function of the pandas library can also be used to read some specific columns and specific rows after the json file is read to a dataframe. we use the multi-axes indexing method called .loc() for this purpose. we choose to display the salary and name column for some of the rows.

import pandas as pd
data = pd.read_json('path/input.xlsx')

# use the multi-axes indexing funtion
print (data.loc[[1,3,5],['salary','name']])

when we execute the above code, it produces the following result.

   salary   name
1   515.2    dan
3   729.0   ryan
5   578.0  rasmi

reading json file as records

we can also apply the to_json function along with parameters to read the json file content into individual records.

import pandas as pd
data = pd.read_json('path/input.xlsx')

print(data.to_json(orient='records', lines=true))

when we execute the above code, it produces the following result.

{"dept":"it","id":1,"name":"rick","salary":623.3,"startdate":"1\/1\/2012"}
{"dept":"operations","id":2,"name":"dan","salary":515.2,"startdate":"9\/23\/2013"}
{"dept":"it","id":3,"name":"tusar","salary":611.0,"startdate":"11\/15\/2014"}
{"dept":"hr","id":4,"name":"ryan","salary":729.0,"startdate":"5\/11\/2014"}
{"dept":"finance","id":5,"name":"gary","salary":843.25,"startdate":"3\/27\/2015"}
{"dept":"it","id":6,"name":"rasmi","salary":578.0,"startdate":"5\/21\/2013"}
{"dept":"operations","id":7,"name":"pranab","salary":632.8,"startdate":"7\/30\/2013"}
{"dept":"finance","id":8,"name":"guru","salary":722.5,"startdate":"6\/17\/2014"}