MarkLogic is a database that served to retrieve different types and heterogenous data. The data representations that MarkLogic uses are text, binary, JSON and XML.


MarkLogic Speed

Terabytes of data can be stored in a MarkLogic Database. According to the documentation found here, MarkLogic is able to deliver data are sub-second response time.
Studies show that the largest deployment exceed 200 terabytes and a billion documents. It is expected to surpass a petabyte today.


MarkLogic Data Ingestion

Examples of MarkLogic data can be tweets, flight records, manuals, books, web pages. If the data are not in XML or JSON format, they can be converted during ingestion.
MarkLogic supports semantics data via RDF (Resource Description Format) by using a query language called SPARQL.
In addition, MarkLogic supports also Binary documents. They are differently according their size.

Languages and APL supported by MarkLogic

First of all what is APL?

APL stands for Application Programming Languages. The ones that can be used in MarkLogic are :
1. XQuery (native, server-side support)
2. JavaScript (native, server-side support)
3. XSLT (native, server-side support)
4. Java
5. Node.js
6. C#
7. SQL (native, server-side support)
8. SPARQL (native, server-side support)
9. REST interfaces

MarkLogic natively communicates via HTTP and HTTPS, XDBC Protocol (similar to what JDBC and ODBC for relational databases) and WebDAV File Protocol.

MarkLogic REST API

Marklogic enabled the use of REST API to expose Webservices. It allows you to create, read, update, and delete documents (CRUD).
Structure: http://host:port/version/service/...

Create a REST API instance

curl -v -X POST  --anyauth -u admin:admin \
  --header "Content-Type:application/json" \
  -d '{"rest-api": { "name": "haiz", "port": "8011", "database": "haizDB", "modules-     database": "haiz-Modules" } }' \
  'http://localhost:8002/v1/rest-apis'

Create two REST users

Writer
curl -v -X POST  --anyauth -u admin:admin \
  --header "Content-Type:application/json" \
  -d '{"user-name":"haiz-writer", "password": "haizly", "role": ["writer"]}' \
  'http://localhost:8002/manage/v2/users'
Admin
curl -v -X POST  --anyauth -u admin:admin \
  --header "Content-Type:application/json" \
  -d '{"user-name":"haiz-admin", "password": "haizly", "role": ["admin"]}' \
  'http://localhost:8002/manage/v2/users'

Assess that haiz-writer & haiz-admin users exist by going to http://localhost:8002/manage/v2/users in a browser window.


Actions Images
Got to http://localhost:8002/manage/v2/users. You will see like the image here MarkLogic user list
How to search data ?

You have two options:


Actions Command
Got to this url in the browser http://localhost:8011/LATEST/search?q=apple
Use Curl and specify the user and password curl -X GET --anyauth --user rest-writer:x 'http://localhost:8011/LATEST/search?q=chicken&format=json'

Create a Text document

Create a json file and run this command:

    curl -v -X PUT \
    --digest --user haiz-writer:x \
    -d'{I\'m a text' \
    -H "Content-type: text/plain" \
    'http://localhost:8011/LATEST/documents?uri=/example/recipe.json' \ 

HTTP Response: “201 Document Created”

Create a JSON document

Create a json file and run this command:

    curl -v -X PUT \
    --digest --user haiz-writer:x \
    -d'{"recipe": {"name" :"Orange", "ingredients":"oil"}}' \
    'http://localhost:8011/LATEST/documents?uri=/example/recipe.json' \ 

HTTP Response: “201 Document Created”

Create a XML document

Create a json file and run this command:

    curl -v -X PUT \
    --digest --user haiz-writer:x \
    -d'<person><first>Carl</first><last>Sagan</last></person>' \
    'http://localhost:8011/LATEST/documents?uri=/example/person.xml' \ 

HTTP Response: “201 Document Created”

Query and Search

Query by Example

For exampler, say you want to find all documents representing bags that are black. You could do this by posting up a query like:

{ "$query":  
    { color :  "black" }
}
String Query
curl -X GET \
  --anyauth -u username:password \
  'http://myhost:port/v1/search?q=mocha&collection=drinks&format=json'
Structured Query
{ "query":  
    { "and-query": 
        { "queries" : [
            {
                "properties-constraint-query": {
                    "constraint-name": "property-constraint",
                    "term-query" : {
                        [ "done" ]
                    }
                }
            },
            {
                "word-constraint-query": { 
                    "constraint-name": "author-word", 
                    "text": "Haiz" 
                } 
            }
        ] } 
    } 
}

Combine MarkLogic with other technologies

You maight be wondering in which case to benefit from MarkLogic technologies. We have seen a course on Apache NiFi.

A data engeniering scenario could:
For example, retrieve datasets computed using C++, and streaming in Apache Kafka, then use NiFi to ingest those data into MarkLogic.


Exercice 1: How to find JSON documents that do not have a specific property?

Let us imagine that you would like to retrieve, from the database, all the documents that do not have a specific property:“myproperty”. How would you do?

It is very simple, Using javascript, you can write the following request:

cts.search(cts.andQuery([
        cts.collectionQuery(uris.uris().collections['collection-name']),
        cts.notQuery(
            cts.jsonPropertyScopeQuery(
                'myproperty', 
                cts.trueQuery([])
            )
        )]));
// or
cts.search(cts.andQuery([
        cts.collectionQuery(uris.uris().collections['collection-name']),
        cts.notQuery(
            cts.elementQuery(
                xs.QName('myproperty'), 
                cts.andQuery([])
            )
        )]));

And, with XQuery, you can write the following request:

cts:search(
  fn:doc(),
  cts:not-query(
    cts:json-property-scope-query(
      "myproperty",
      cts:true-query()
    )
  )
)

Let me explain the code a little bit.
First you need to create a query that will be searched using cts.search.
The query you msut create must let you access JSON properties. Therefore, you must use: cts.jsonPropertyScopeQuery().
Do not forget to add cts:true-query() to force Marklogic to search from all data in the selected documents (collection-name).

Fot more in-depth knowledge, see the following article article on Marklogic.

Written by

Albert Oplog

Hi, I'm Albert Oplog. I would humbly like to share my tech journey with people all around the world.