A little overview of Elasticsearch
Elasticsearch is a real-time distributed and open-source full-text search engine.
It is accessible from the RESTful web service interface and uses schema-less JSON (JavaScript Object Notation) documents to store data.
Elasticsearch can be used as a replacement for document stores like MongoDB.
The following are key terminologies associated with Elasticsearch:
i) Node: It is a single-running instance of Elasticsearch. Multiple nodes could be running on a single server depending on its resource capabilities.
ii) Cluster: As the name suggests, it is a collection of one or more nodes that provides collective indexing and search capabilities across all nodes
iii) Index: Index is a collection of different types of documents and their properties. This means that you can have a collection of documents that contains data for a specific part of the application. In RDBMS it is analogous to a table.
iv) Field: A field is analogous to a column in RDMS.
v) Document: A document is a collection of fields in JSON format. In RDBMS it is analogous to a row. Every document has a unique identifier called UID.
vi) Shard: Indexes are horizontally subdivided into shards. This means that each shard contains all the properties of the document but contains less number of JSON objects. It is more like a subset of the entire index. It acts like an independent node and can be stored in any node. The primary shard is the original horizontal part of an index.
vii)Replicas: Elasticsearch allows users to create replicas of their indexes and shards. Replication helps in increasing the availability of data in case of failure. This also improves the performance by carrying out parallel search operations in these replicas.
Alright! Now that you are familiar with Elasticsearch, let’s put it to use in our simple PHP application
Installation & Configuration
We need to set up an Elasticsearch node. So install it from the instructions provided here.
I’m going to be using Ubuntu for this article. I am running Ubuntu on Vagrant. We can install via apt-get, but I’ll download the Debian package instead and run it.
#download with a simple name
wget -c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.8.0-amd64.deb -O elastic.deb
#run installer
sudo dpkg -i elastic.deb
Elasticsearch also needs Java to run. So let’s install Java.
sudo apt-get update
sudo apt install openjdk-11-jre-headless
We cannot run Elasticsearch as the root user. So, we need to give our user the required permission on certain files:
sudo chown -R $USER:$GROUP /etc/default/elasticsearch
sudo chown -R $USER:$GROUP /usr/share/elasticsearch/
sudo chown -R $USER:$GROUP /etc/elasticsearch
sudo chown -R $USER:$GROUP /var/lib/elasticsearch
sudo chown -R $USER:$GROUP /var/log/elasticsearch
Modify its configuration file located at /etc/elasticsearch/elasticsearch.yml
with the below content
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
xpack.security.http.ssl:
enabled: false
keystore.path: certs/http.p12
xpack.security.transport.ssl:
enabled: false
Configure Java path for Elasticsearch. Open /etc/default/elasticsearch
and enter the following:
ES_JAVA_HOME=/usr/bin/java
Start the Elasticsearch service
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service
Let’s start the Elasticsearch instance. Simply run:
/usr/share/elasticsearch/bin/elasticsearch
Manual Testing
We need a REST client to perform manual operations on Elasticsearch. Visual Studio Code is a popular editor. We are going to use it. If you don’t have it already, then download it. We will use the “Thunder Client” plugin which is available in Visual Studio Code. So open your Visual Studio Code editor. From the left sidebar, click the extension icon. A drawer will open. In the search text field, enter “Thunder Client” and click the first item in the list. Then in the content area, click the install button.
Once this is installed, you’ll find the “Thunder Client” icon in the left sidebar. Click on it. It will open a side drawer. In that click, the “New Request” button.
In the content area, select the “PUT” method and enter the URL as “http://127.0.0.1:9200/blog" and hit the send button.
It’ll create an index by the name of “blog” as verified in the response:
Let’s test further by adding a record inside this index.
Click, the “New Request” button on the left side of the editor, select the “POST” method, and enter the URL as http://127.0.0.1:9200/blog/_doc/1.
Here “1” is the id for our intended document, which we are providing manually. Enter the following JSON content:
{
"title": "This is a test blog post",
"body":"A dummy content for body of the post"
}
Now hit send button and you should see the output like this:
If you send the data again, then it will increment its version. Basically, an update operation.
Let’s do that. Since we need to be storing the “tags” field as well. Modify the JSON content like the below and hit send:
{
"title": "This is a test blog post",
"body":"A dummy content for body of the post",
"tags":["blog","post","test"]
}
You should see output something like:
As you can see, the version of the record got updated from 1 to 2.
Insert another record without entering any manual id number. Click, the “New Request” button on the left side of the editor, select the “POST” method, and enter the URL as http://127.0.0.1:9200/blog/_doc.
Enter the following JSON content:
{
"title": "What is Lorem Ipsum?",
"body":"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.",
"tags":["lorem","ipsum","capsicum"]
}
Notice the “_id” value. It got assigned automatically.
To fetch all documents in the index, click the “New Request” button on the left side of the editor, select the “GET” method, enter the URL as http://127.0.0.1:9200/blog/_search and hit Send button:
Let’s delete the index so that we can start afresh.
Click the “New Request” button on the left side of the editor, select the “DELETE” method, enter the URL as http://127.0.0.1:9200/blog, and hit Send button.
Now when you search again you should get “index_not_found_exception”.
We are now set to perform operations programmatically.
Setup client
First, install the Apache web server.
sudo apt update && sudo apt install apache2 && sudo apt install libapache2-mod-php
Then make index.php as having high priority over its HTML counterpart. Open /etc/apache2/mods-enabled/dir.conf
and change:
DirectoryIndex index.html index.cgi index.pl index.php index.xhtml index.htm
To
DirectoryIndex index.php index.html index.cgi index.pl index.xhtml index.htm
Restart Apache service:
sudo systemctl restart apache2
Now we are ready to code our web application. Since it is going to be a PHP application, let’s install it.
sudo apt install php-cli unzip
We need a PHP client for Elasticsearch. For that we need Composer. Composer is a PHP package dependency manager, like NPM for NodeJS.
To install composer, simply run:
curl -sS https://getcomposer.org/installer -o /tmp/composer-setup.php
sudo php /tmp/composer-setup.php --install-dir=/usr/local/bin --filename=composer
Check the installation by running:
composer --version
Now go to /var/www/html
and run this command to download the client:
composer require elasticsearch/elasticsearch
This will create a vendor
directory and composer.json
file.
Now create an additional directory with a client initialization script
mkdir app
cd app
touch init.php
Paste the following content in the init.php
file
<?php
require_once 'vendor/autoload.php';
$es = new Elasticsearch\Client([
'hosts' => ['127.0.0.1:9200'],
]);
?>
The above code initializes the Elasticsearch client that will run on localhost at port 9200
Time for the web interface.
We are going to create two screens, one for searching and another one for adding data. We are going to keep the interface to a minimum.
So let’s create the first screen, index.php, and enter the following content:
<?php
require_once "app/init.php";
if(isset($_GET['q'])){
$q = $_GET['q'];
$query = $es->search([
'body'=> [
'query' => [
'bool' => [
'should' => [
[ 'match' => [ 'title' => $q ] ],
[ 'match' => [ 'body' => $q ] ],
]
]
]
]
]);
if($query['hits']['total']["value"] >= 1){
$results = $query['hits']['hits'];
}
}
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Search | ElasticSearch Demo</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-9ndCyUaIbzAi2FUVXJi0CjmCapSmO7SnpJef0486qhLnuZ2cdeRhO02iuK6FUUVM" crossorigin="anonymous">
</head>
<body>
<div class="row mt-3">
<div class="mx-auto col-10 col-md-8 col-lg-6">
<!-- We are setting method as GET for form post, so we can see the enered text in url -->
<form action="<?=$_SERVER["PHP_SELF"]?>" method="GET" autocomplete="off">
<div class="row mb-3">
<div class="col">
<input type="text" class="form-control" name="q" placeholder="Enter text to Search Blog" />
</div>
</div>
<div class="row mb-3">
<div class="col">
<input type="submit" class="btn btn-primary" value="Search" />
</div>
</div>
</form>
<?php
if(isset($results)){
foreach($results as $r){
?>
<div class="row mb-3">
<div class="col">
<div class="alert alert-secondary" role="alert">
<p class="fw-bolder"><?=$r["_source"]["title"]?></p>
<?=implode(",",$r["_source"]["tags"])?>
</div>
</div>
</div>
<?php
}
}else{
echo '<div class="alert alert-danger" role="alert">
No data found
</div>';
}
?>
</div>
</div>
<!-- We are going to skip validation checks -->
</body>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js" integrity="sha384-geWF76RCwLtnZ8qwWowPQNguL3RmwHVBC9FhGdlKrxdiJJigb/j/68SIy3Te4Bkz" crossorigin="anonymous"></script>
</html>
Let’s analyze the code above. The PHP code before HTML doctype declaration is sub-divided into three parts:
i) Requiring the Elasticsearch client, which we created before, in this script
ii) If we get URL parameter ‘q’, then we are going to perform a search operation query on Elasticsearch on the “title” and “body” fields.
iii) $query[‘hits’][‘total’][“value”]
will check if there is at least one record is present. If there is, then initialize a “results” variable like this:
$results = $query['hits']['hits'];
And after the closing of the HTML form tag, we are checking if the “results” variable is not empty then loop through the documents contained in it and display the “title” and “tags” field values.
Let’s create the final web interface for creating documents in the index. Create an “add.php” file and enter the following content:
<?php
require_once "app/init.php";
if(!empty($_POST)){
if(isset($_POST["title"]) && isset($_POST["body"]) && isset($_POST["tags"])){
$title = $_POST["title"];
$body = $_POST["body"];
$tags = explode("," , $_POST["tags"]);
$indexed = $es->index([
"index" => "blog",
"title" => $title,
"body" => [
'title' => $title,
'body' => $body,
'tags' => $tags
]
]);
if($indexed){
echo '<div class="alert alert-success mt-3 mb-3" role="alert">
Document inserted successfully!
</div>';
}
}
}
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Create | ElasticSearch Demo</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-9ndCyUaIbzAi2FUVXJi0CjmCapSmO7SnpJef0486qhLnuZ2cdeRhO02iuK6FUUVM" crossorigin="anonymous">
</head>
<body>
<div class="row mt-3">
<div class="mx-auto col-10 col-md-8 col-lg-6">
<!-- We are setting method as GET for form post, so we can see the enered text in url -->
<form action="<?=$_SERVER["PHP_SELF"]?>" method="POST" autocomplete="off">
<div class="row mb-3">
<div class="col">
<input type="text" name="title" class="form-control" placeholder="Enter Title" />
</div>
</div>
<div class="row mb-3">
<div class="col">
<textarea name="body" rows="8" class="form-control" placeholder="Enter Body content"></textarea>
</div>
</div>
<div class="row mb-3">
<div class="col">
<input type="text" name="tags" class="form-control" placeholder="Enter comma separated Tags" />
</div>
</div>
<div class="row mb-3">
<div class="col">
<input type="submit" class="btn btn-primary" value="Create" />
</div>
</div>
</form>
</div>
</div>
<!-- We are going to skip validation checks -->
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js" integrity="sha384-geWF76RCwLtnZ8qwWowPQNguL3RmwHVBC9FhGdlKrxdiJJigb/j/68SIy3Te4Bkz" crossorigin="anonymous"></script>
</body>
</html>
Let’s analyze the code above. The PHP code before HTML doctype declaration is sub-divided into three parts:
i) Like the search page, we insert the Elasticsearch client in this script.
ii) If the page is submitted with all the form fields, then we are going to perform an insert operation on the “blog” index in Elasticsearch.
iii) The result of the operation is stored in the “indexed” variable. If it is not empty, then we are displaying a success message using the Bootstrap alert component.
You can then go back to the Search screen (index.php) in the browser and try searching for something. Below is a screenshot of a generic search result of character ‘a’:
The Git repo for this article can be found here.
That’s it! Hope you found the article useful.
Happy Coding!
If you found this post helpful, please like, share and follow me. I am a developer, transitioning to DevOps, not a writer — so each article is a big leap outside my comfort zone.
If you need a technical writer or a DevOps engineer, do connect with me on LinkedIn: linkedin.com/in/mubin-khalife.
Thank you for reading and for your support!