Tuesday, August 22, 2017
Abhi: JMockit Tutorial:Learn it today with examples
Abhi: JMockit Tutorial:Learn it today with examples: JMockit is one of the many mocking frameworks available for unit testing in Java. If you have to unit test you'll invariably end up moc...
Tuesday, August 8, 2017
ElasticSearch Indexing Performance Tuning
Hi All,
Here I am going to provide you with some tips in improving indexing performance in ElastcSearch. If you are doing some indexing heavy operations, this would help you to improve the performance in great extent.
Before Performance Tuning
Before concluding that indexing is too slow, be sure that the cluster's hardware is fully utilized: use tools like iostat, top and ps to confirm CPU or IO is saturated across all nodes. If not then it needs more concurrent requests, but if EsRejectedExecutionException is thrown from the java client, or TOO_MANY_REQUESTS (429) HTTP response from REST requests, then it means that there are too many concurrent requests.
Since the settings discussed here are focused on maximizing indexing throughput for a single shard, it is best to first test just a single node, with a single shard and no replicas, to measure what a single Lucene index is capable of on your documents, and iterate on tuning that, before scaling it out to the entire cluster. This can also give a baseline to roughly estimate how many nodes it will need in the full cluster to meet your indexing throughput requirements.
Once single is shard working well, you can take full advantage of Elasticsearch's scalability and multiple nodes in your cluster by increasing the shard count and replica count.
-->
Here I am going to provide you with some tips in improving indexing performance in ElastcSearch. If you are doing some indexing heavy operations, this would help you to improve the performance in great extent.
Before Performance Tuning
Before concluding that indexing is too slow, be sure that the cluster's hardware is fully utilized: use tools like iostat, top and ps to confirm CPU or IO is saturated across all nodes. If not then it needs more concurrent requests, but if EsRejectedExecutionException is thrown from the java client, or TOO_MANY_REQUESTS (429) HTTP response from REST requests, then it means that there are too many concurrent requests.
Since the settings discussed here are focused on maximizing indexing throughput for a single shard, it is best to first test just a single node, with a single shard and no replicas, to measure what a single Lucene index is capable of on your documents, and iterate on tuning that, before scaling it out to the entire cluster. This can also give a baseline to roughly estimate how many nodes it will need in the full cluster to meet your indexing throughput requirements.
Once single is shard working well, you can take full advantage of Elasticsearch's scalability and multiple nodes in your cluster by increasing the shard count and replica count.
1. Limit the number of analyzed fields in the
candidate.
Analyzed
fields are passed through an analyzer to convert the string into a list of
individual terms before being indexed. This reduces the indexing performance. The
analysis process allows Elasticsearch to search for individual words within
each full text field. Analyzed fields are not used for sorting and seldom used
for aggregations.
(The string
field is unsupported for indexes created in 5.x in favor of the text and keyword fields. Attempting to create a string field in an index created in 5.x will cause Elasticsearch to
attempt to upgrade the string into
the appropriate text or keyword field. Text is an analyzed field and keyword
is not analyzed field.)
2.
Disable
merge throttling.
Merge throttling
is Elasticsearch’s automatic tendency to throttle indexing requests when it
detects that merging is falling behind indexing. It makes sense to update
cluster settings to disable merge throttling (by setting indices.store.throttle.type to “none”) if it is needed to optimize
indexing performance, not search. You This could be made persistent (meaning it
will persist after a cluster restart) or transient (resets back to default upon
restart), based on the use case.
3. Disable Refresh Interval
Increase the
refresh interval in the Index Settings API. By default, the index refresh
process occurs every second, but during heavy indexing periods, reducing the
refresh frequency can help alleviate some of the workload.
4.
Increase
translog flush threshold size
When a document
is indexed in Elasticsearch, it is first written to write ahead log file called
the translog. When the translog is flushed (by default is flushed after every
index, delete, update, or bulk request, or when the translog becomes a certain
size, or after a time interval) Elasticsearch then persists the data to disk
during a Lucene commit, an expensive operation.
The translog
helps prevent data loss in the event that a node fails. It is designed to help
a shard recover operations that may otherwise have been lost between flushes.
Once the
translog hits the index.translog.flush_threshold_size
size, a flush will happen.
Index.translog.flush_threshold_size can be increased from the default 512 MB to something
larger, such as 1 GB which allows larger segments to accumulate in the translog
before a flush occurs. By letting larger segments build, flush happens less
often, and the larger segments merge less often. All of this adds up to less
disk I/O overhead and better indexing rates.
5.
Disable
the number of replicas
When documents are
replicated, the entire document is sent to the replica node and the indexing
process is repeated verbatim. This means each replica will perform the
analysis, indexing, and potentially merging process.
In contrast, if indexed with zero replicas and
then enable replicas when ingestion is finished, the recovery process is
essentially a byte-for-byte network transfer. This is much more efficient than
duplicating the indexing process.
6.
Index
ID – Using auto generated IDs
When indexing a
document that has an explicit id, Elasticsearch needs to check whether a
document with the same id already exists within the same shard, which is a
costly operation and gets even more costly as the index grows. By using
auto-generated ids, Elasticsearch can skip this check, which makes indexing
faster.
Note: This can improve the indexing performance greatly.
7.
The
number of nodes
There is no hard
and fast rule for determining the number of nodes required for a cluster. It
leads to start with a single node and then increase the number of nodes until
you get the expected performance.
A node is a
single server/running instance that is part of the cluster, stores data, and
participates in the cluster’s indexing and search capabilities.
Once a single node
has reached its maximum performance (CPU, Memory, IO) then a new node can be
added to the cluster and the load can be balanced across the cluster. This is
done through the elastic search client using Round Robin Strategy to balance
the load against the nodes. Transport Client automatically does this.
8.
Tweak
the VM Options – Increase heap size
By default,
Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 2
GB. When moving to production, it is important to configure heap size to ensure
that Elasticsearch has enough heap available. Elasticsearch will assign the
entire heap specified in jvm.options via the Xms (minimum heap size) and Xmx
(maximum heap size) settings.
The value for
these setting depends on the amount of RAM available on your server. Good rules
of thumb are:
- Set the minimum heap size (Xms) and maximum heap size (Xmx) to be equal to each other.
- The more heap available to Elasticsearch, the more memory it can use for caching. But note that too much heap can subject you to long garbage collection pauses.
- Set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches.
9.
Bulk
processor tuning
Bulk indexing
requests should be used for optimal performance. Bulk sizing is dependent
on data, analysis, and cluster configuration, but a good starting point is 5–15
MB per bulk. Note that this is physical size. Document count is not a good
metric for bulk size. For example, if 1,000 documents are indexed per bulk:
·
1,000 documents at 1 KB each is 1 MB.
·
1,000 documents at 100 KB each is 100 MB.
Those are
drastically different bulk sizes. Bulks need to be loaded into memory at the
coordinating node, so it is the physical size of the bulk that is more
important than the document count.
Start with a bulk size around 5–15 MB and slowly
increase it until there is no more performance gain. Then increasing the
concurrency of the bulk ingestion should be started (multiple threads, and so
forth).
Hope that helps.
Thank You.
References :
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html
https://www.elastic.co/guide/en/elasticsearch/reference/master/heap-size.html
-->
Monday, March 20, 2017
Develop Your First REST Web Service with Java Spring
Hi All,
Let's develop our first REST web service. There are various libraries and methods you can follow to develop a REST API. But here we are going to use Spring framework to get this done in a much more easier manner. Here I am using IntelliJ IDEA 2016 to develop my application.
First let's go and create a maven project in Idea. Here we are not going to select a archetype. We will just use the existing template.
Let's develop our first REST web service. There are various libraries and methods you can follow to develop a REST API. But here we are going to use Spring framework to get this done in a much more easier manner. Here I am using IntelliJ IDEA 2016 to develop my application.
First let's go and create a maven project in Idea. Here we are not going to select a archetype. We will just use the existing template.
Let's update the pom.xml. We need to specify our spring dependencies and also the tomcat plugin to run our service locally. We will also need to add Jackson to do JSON mapping for us. Once you have update the pom file, it should should look like below.
Once the dependencies are added to the project, you can check that by going into the project structure.
Then we need to specify the WebApp folder where the necessary web resources are present. It's important that we adhere to the given structure, otherwise during the project build and packaging, we need to edit the pom to recognize where our resources exactly resides in. Project structure should look like below.
Now let's create a package and add our REST controller class inside the package. REST controller is acting as an endpoint to the requests. It will get the requests as GET, POST and etc. according to the requests it will reply with the necessary output. To specify this class as a controller we need to use the '@RestController' annotation. Then we can optionally use the '@RequestMapping' to specify the path to our controller from the root. In order to include header for CORS( Cross-Origin Resource Sharing ) we need to use '@CrossOrigin' annotation. To read more about CORS you can refer to this link.
Now we have setup our rest controller. Let's add some REST methods to get some work done with our controller. There are several rest methods like GET, POST, PUT, PATCH, DELETE. You can read about the REST methods here.
Before adding the methods, lets create a simple Person class and a list of people inside our controller, just to test both GET and POST methods.
Now we can go and create our methods. Following is the complete code for the controller class.
We also need to add the servlet.xml and web.xml inside WEB-INF folder.
Now let's run our program first and then look into the methods and what we have done in each.
In order to run the program we can create new debug configuration in IntelliJ as following. Then you can click on the debug button to debug the program.
In order to check the results it's good to install the Postman plugin for chrome.
Now lets check our results while referring to the methods.
The first method is a GET method that creates a Person and returns the object as the response. The person object is converted to a JSON object automatically with Jackson Data Binding. The method is just a simple GET method without any arguments passed into it. You can access the web service from the following URI.
http://localhost:8080/JavaSpringRESTDemo/learning/newperson
If you check the result through Postman it will be as following.
Second addPerson method is a POST method, that accepts a Person as a JSON object and adds the Person to a list. Here the Person object is again sent as a JSON object and it is mapped to a POJO by Jackson data binding. You can send the POST request as following.
Once you send the request, response will be the existing list with the added Person as a JSON object.
Now let's use a URL parameter to get the person, when the user ID is specified. URL parameter is specified by the annotation @RequestParam(value = "id"). So we have to specify the URL as following and get the output object.
There is another way we can specify a parameter in the URL itself. It's by using a Path Variable. The final method is developed to accept a path variable containing a user ID. You can use the following URL to get the person with ID 15.
So that's should be it for now. Hope to publish more with Spring Related stuff.
Hope that helps.
Thank you.
Sunday, March 19, 2017
Communicate Within Wars Inside Same Container
Hi All,
There are certain scenarios when you want to communicate within the wars inside the same container without using the network. For an example, you need to avoid any network delay that would cause either by using web-services,RMI or HTTP.
So what I'm going to show you is one working solution to achieve the above problem. What we can do is introduce a common library that is a dependent to both the services. Through the intermediate library we can continue the communication among the war files.
For this example, I'm using a service called, Front Service to accept the user request. Then a second service called 'Ground Service' that contains a method need to be called by the Front Service. The communication between the services are done with a JAR named 'Common-Lib'.
Common Lib has an interface which is similar to the Ground Service. It contains the method definitions of the Ground Service.
Common Lib also has a class that gets and sets the Service Instances registered to the jar.
Then let's develop the Front Service. This service is a simple Spring service, that gets a message from the Ground Service.
Finally we need to implement the Ground Service.
Here we have the Ground Service class and the helper classes to map a service instance to the Service Handler. There is a class named GroundServiceConfig, that sets an instance of the IGroundService interface to the Service Handler. IGroundService is implemented in the GroundServiceAdapter class.
Now we have the two services and the common library defined. It's important to note that although common-lib is a dependent of both the services, we do not bundle them with the service. Therefore in the POM.xml, we specify the 'common-lib' dependency as provided.
Therefore we need to deploy the common-lib in the 'lib' inside tomcat installation.
What happens here is that, when the Ground Service is initializing, it will assign a new instance of the IGroundService in the Service Handler. This instance can then be used to call the service methods from the other Services that uses 'Common-Lib' as a dependency.
Now let's deploy the two services inside the web-apps and put the common-lib inside the 'lib'
Let's go to the Front Service URL and check the result. As you can see we have got the message successfully from the GroundService.
The complete projects and code can be found here.
Hope that helps.
Thank You.
There are certain scenarios when you want to communicate within the wars inside the same container without using the network. For an example, you need to avoid any network delay that would cause either by using web-services,RMI or HTTP.
So what I'm going to show you is one working solution to achieve the above problem. What we can do is introduce a common library that is a dependent to both the services. Through the intermediate library we can continue the communication among the war files.
For this example, I'm using a service called, Front Service to accept the user request. Then a second service called 'Ground Service' that contains a method need to be called by the Front Service. The communication between the services are done with a JAR named 'Common-Lib'.
Common Lib has an interface which is similar to the Ground Service. It contains the method definitions of the Ground Service.
Common Lib also has a class that gets and sets the Service Instances registered to the jar.
Then let's develop the Front Service. This service is a simple Spring service, that gets a message from the Ground Service.
Finally we need to implement the Ground Service.
Here we have the Ground Service class and the helper classes to map a service instance to the Service Handler. There is a class named GroundServiceConfig, that sets an instance of the IGroundService interface to the Service Handler. IGroundService is implemented in the GroundServiceAdapter class.
Now we have the two services and the common library defined. It's important to note that although common-lib is a dependent of both the services, we do not bundle them with the service. Therefore in the POM.xml, we specify the 'common-lib' dependency as provided.
Therefore we need to deploy the common-lib in the 'lib' inside tomcat installation.
What happens here is that, when the Ground Service is initializing, it will assign a new instance of the IGroundService in the Service Handler. This instance can then be used to call the service methods from the other Services that uses 'Common-Lib' as a dependency.
Now let's deploy the two services inside the web-apps and put the common-lib inside the 'lib'
Let's go to the Front Service URL and check the result. As you can see we have got the message successfully from the GroundService.
The complete projects and code can be found here.
Hope that helps.
Thank You.
Saturday, March 4, 2017
Wednesday, February 22, 2017
Subscribe to:
Posts (Atom)