Skip to content

Putting and Getting Data from a Database

April 30, 2011

Databases support the putting and getting of data. What distinguishes one database type from another is the structure of the data they store and the means by which that data is retrieved. There are numerous types of databases. However, this short post will only discuss a few.

Primitive Store: One can imagine the most basic database which contains only simple data values. Let’s call such a database a “primitive store.” With a primitive store, there is no structure to the data. Each datum is a primitive value in a single unordered set (i.e. a bag of primitives). Primitives can be inserted into the set and later retrieved.

// put data
db.put('marko');
db.put(31);
db.put(true);
// get data
Iterator results = db.get();
Iterator filteredResults = db.get{it.startsWith('ma')};

Key/Value Store: For key/value-based storage systems the data structure is an ordered 2-tuple that contains a key and a value. Other names for this data structure include an associative array, a map, or a dictionary. When interacting with a key/value store, data pairs are inserted into the database and values are retrieved based on their keys.

// put data
db.put('name', 'marko');
db.put('age', 31);
// get data
Object value = db.get('name');

Document Store: The next level of structural complexity is the document store. Documents are tree/nested structures such as those provided by XML and JSON. This added level of complexity provides more modeling flexibility to the developer. For example, one can model a person as a document as follows:

Document document = { 
  type : "person",
  name : "marko",
  age : 31,
  skills : {
    languages : {
      ["java", "groovy", "gremlin", "R"]
     }
  }
}

To find a person that is skilled in R, a “query object” is constructed. In a manner similar to classic tuple systems, the query object provides required bindings. For all fields not specified, a wildcard is assumed.

// put data
db.put(document);
// get data
Document result = db.findOne({ 
    type : "person", 
    skills.languages : "R"
}); 

Finally, most key/value and document stores support the MapReduce pattern. With MapReduce, each document (or key/value) is processed in parallel and the result of each process is aggregated and returned to the user. For example, a map() and reduce() function can be written to determine the distribution the languages of the people in the database.

Graph Database: The data structure of a graph database is an arbitrarily connected component known as a graph (e.g. sequences, trees, lattices, cycles, etc.). In some ways, a graph database is like a document database save that particular fields in a document can make direct reference to other documents. In graph jargon, the objects are known as vertices and the relationships between vertices are known as edges.

Vertex v = db.putVertex({'name' : 'marko', 'age' : 31});
Vertex u = db.putVertex({'language' : 'gremlin'});
db.putEdge(v, 'hasSkill', u);

There are two ways to retrieve data from a graph database: pattern matching and traversing. A pattern match query is similar, in many ways, to document querying in a document store. A graph pattern is defined where particular components are variables (i.e. wildcards). The classic language for graph-based pattern matching is SPARQL.

SELECT ?x ?y WHERE {
  ?x knows marko .
  marko hasSkill ?y .
  ?x hasSkill ?y
}

In the query above, the variables ?x and ?y bind to people and skills, respectively. However, the graph pattern rule says that binding only occurs for those people who know Marko and share a skill with him.

The second query model is graph traversing. With a graph traversal, an arbitrary number of vertices/edges can be touched (and looped over) in order to yield end points (traversal destinations), paths (traversal history), or a resultant computation (traversal side-effect). In the graph traversal language Gremlin, the ages of the people that share a skill with Marko can be determined using the following query (traversal destination):

marko.out('hasSkill').in('hasSkill').except([marko]).age

To determine all non-looping friendship paths between Marko and Josh the following can be evaluated (traversal history):

marko.out('friend').uniquePath.loop(2){it.object != josh}.paths

Finally, to determine the centrality of every vertex in the graph, a traversal of the following form can be used, where m is a map that stores the resulting centrality score for each vertex (traversal side-effect).

m = [:]
g.V.out.groupCount(m).loop(2){it.loops < 100}

There are numerous types of databases. Each provides a means of storing and retrieving structured data. Depending on the complexity of the domain being modeled and the types of computations required, there exists a database out there to meet the needs of every project.

References

Rodriguez, M.A., “An Overview of Data Management Paradigms: Relational, Document, and Graph,” Data Management Workshop, University of New Mexico, February 2010.

Webber, J., “Square Pegs and Round Holes in the NOSQL World,” World Wide Webber, April 2011.

Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science and Technology, 36(6), pp. 35–41, doi:10.1002/bult.2010.1720360610, August 2010.

Rahien, A., “That No SQL Thing – Key/Value Stores,” Unnatural Acts on Source Code, March 2010.

About these ads
Follow

Get every new post delivered to your Inbox.

Join 134 other followers

%d bloggers like this: