JSON impact on memory.

anand pathak
4 min readAug 13, 2017

I was working with Lambda function where a CSV is provided as input and operations like mapping fields with the database, processing fields and calculation are done on the CSV. The CSV could have 20 Million of records so operating on the complete CSV at once was a bit costly operation, thus we ran it in batches. One of the operations running on each row of CSV was mapping it with Mysql data and that generated a large amount of load on the Mysql server due to a high number of queries running on MySQL. So to solve it we created a JSON file from the MySQL query as we knew that the data of the table won’t change regularly.

At that moment we were not aware that in order to make it faster we might end up using more memory. when JSON was loaded into the memory it consumed ~700 MB space, which was surprising because JSON file was only about 60 MB and that made us curious about how JSON is loaded into the memory?

To understand how JSON is stored in memory I raised a question on StackOverflow and received one answer which does explain it, quoting the answer below :

JSON string length and size of the corresponding object in memory are not strictly correlated; in most cases the JSON representation is expected to be smaller, sometimes by a lot. E.g. for the innermost nesting of your example: "a":0, takes 6 bytes, whereas for one more property in the created object, you need:

  • one pointer for the property’s name, “a”
  • one pointer for the property’s attributes (writable, enumerable, configurable)
  • one pointer for the property’s value, 0
  • assuming the object is in dictionary mode: on average, approximately two pointers of slack

On a 64-bit platform, that adds up to ~40 bytes.

If you look at an entire object of similar shape: {"a":0,"b":1} is 13 characters, whereas the memory requirement is:

  • “map” pointer
  • “elements” pointer (unused)
  • out-of-object “properties” pointer (unused)
  • value of first property (0)
  • value of second property (1)
  • the object’s “map”: 11 pointers (could be shared with other objects of the same shape, but if you have only one such object, there’s nothing to share it with)
  • the object’s map’s property descriptors: 10 pointers

In total, 26 pointers or 208 bytes. (from https://stackoverflow.com/a/45018888/4424585)

After I understood about how JSON is initialized into the memory, very next question came into my mind was,

How much structure of JSON impact the memory?

To understand this, I created different type of JSON files and then loaded it into the memory.

  1. Blank Object as Value: 78 MB JSON file with 10000000 keys where values are empty object e.g.
    {“random_key_1”:{}, “random_key_2:{}” ….. }
  2. String as Value: 170 MB JSON file with 10000000 key (as incrementing number) where values are random numbers e.g.
    { “1” : “random_value” : “2”: “random_value”}
  3. 2 level of JSON: 170 MB JSON file with 7500000 key where values are JSON object with one key (as incrementing number) and a random number as values e.g.
    { "1" : {"1": "random_value"}, "2" : {"2" : "random_value"} }
  4. Array: 76 MB JSON file with 10000000 values inside an array of an object
    { arr : [ “random_value_1”, “random_valule_2”…]}
  5. Nested JSON as value: 60 KB JSON file with 5000 Keys where each key holds sub JSON as its value.
    { “key1” : { “key2”: { “key3”: {}} }}

Once I had the files, I created a simple Node.js Script which loads and parses the JSON and logged the memory Usage before and after importing JSON file.

console.log(process.memoryUsage());let data=require(‘./data/FirstJSON.json’);console.log(process.memoryUsage());

Below is the graph showing memory used by different JSON file when loaded into memory

From the graph, it can be understood that Array structure requires less space. But if we compare Blank Object, 2 Level JSON object and JSON with string as the value it’s hard to conclude that the structure has much impact on memory. We definitely see high space occupied by 2 Level of JSON but this is due to 75k keys are inside of 75k Sub JSON. In the case of JSON with String value the memory space used is greater than Blank Object, this is because string requires more space than the blank objects.

In the graph, if you see the Nested JSON as value, the bar is same before and after loading the JSON. The very Basic reason for this is the number of Keys (= level of Sub JSON) is only 5k. You might be thinking why did I only used 5k keys here? This is because when I increased the key value, I received an error :

RangeError:/path/data/hierarchical.json: Maximum call stack size exceeded

Which explains that when sub JSON is parsed/stored it uses stack or recursion and it does have a limit. This helped to conclude that the Structure of JSON does matter. Though it does not matter if the JSON is smaller but in the case when the JSON size is larger it is really important to use the JSON structure wisely.

Then I had the question, is there any way to work with large JSON structure?

I was not looking for elastic search, DynamoDB or MongoDB solutions rather I was interested in a custom thing which will allow me to load JSON from file and then operate on it with a lesser impact on memory. There I got a plugin in node JS which streams the JSON file and only add the part in memory which we use.
https://www.npmjs.com/package/JSONStream .
I also got some interesting things to know related to JSON like BSON, messagePack. Below links will help you understand in depth,

https://stackoverflow.com/questions/6355497/performant-entity-serialization-bson-vs-messagepack-vs-json
http://techblog.procurios.nl/k/news/view/14605/14863/how-do-i-write-my-own-parser-(for-json).html
http://msgpack.org/index.html
http://bsonspec.org/
http://www.ecma-international.org/ecma-262/8.0/index.html#sec-json-object

Originally published at anand-pathak.tumblr.com.

--

--