Machine Learning with Apache Spark Quick Start Guide

上QQ阅读APP看书，第一时间看更新

Document databases

Document databases, such as Apache CouchDB and MongoDB, employ a document data model to store semi-structured and unstructured data. In this model, a document is used to encapsulate all the information pertaining to an object, usually in JavaScript Object Notation (JSON) format, meaning that a single document is self-describing. Since they are self-describing, different documents may have different schema. For example a document describing a movie item, as illustrated in the following JSON file, would have a different schema from a document describing a book item:

[
   {
        "title" : "The Imitation Game",
        "year": 2014
        "metadata" : {
            "directors" : [ "Morten Tyldum"],
            "release_date" : "2014-11-14T00:00:00Z",
            "rating" : 8.0,
            "genres" : ["Biography", "Drama", "Thriller"],
            "actors" : ["Benedict Cumberbatch", "Keira Knightley"]
        }
    }
]

Because documents are self-contained representations of objects, they are particularly useful for data models in which individual objects are updated frequently, thereby avoiding the need to update the entire database schema, as would be required with relational databases. Therefore, document databases tend to be ideal for use cases involving catalogs of items, for example e-commerce websites, and content management systems such as blogging platforms.