Software Performance: August 2008

Wednesday, August 13, 2008

Getting Hardware and OS Configurations

For every performance engineer it is must to know in which hardware he is work, which OS he is using for the runs, current configuration of that OS and hardware.

Below is the complete list of commands Operating system wise...

LINUX

Operating System Installed:

$> cat /etc/issue

CPU related details: Number of CPUs, CPU Speed, Vendor, model, and lot more information

$> cat /proc/cpuinfo

Tuesday, August 5, 2008

How to Get Thread dump?

Our Agenda will be like this...

Some Basic Jargons.

What is Thread dump and use of it?

How can I get Thread dump?

How to Analyse it?

Jargons:
Some of the Jorgan's before going to the details..

Thread: It is very important to know what thread means before going to this topic. According to text book, thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently of everything else that might be happening. Just leave it if you don't understand :)

Multi threaded application/process: An process may run as single thread or with multiple threads. For example Oracle DB runs as different processes. Each process can contain single or many threads. If a process run with many threads then we call it as multi threaded process. Generally using multi threads will help us to use the processor resource effectively. For example if your system has 4 processors and if your app is single threaded then at any point of time only one CPU(processor) will be used, rest all 3 will be in idle. So we are wasting the resources. If we redesign the app to make work with 4 threads then you will be using all the CPUs effectively.
Many of the applications that we are using will give an option to the user itself to tell how many threads that the process can use. The best example will be the application servers. For almost all the application servers, we can decide how many threads should it run with.

What is Thread dump?

Thread dump contains the complete list of threads details. These details includes...

The Thread number or name.

Thread state - running,waiting,blocked,idle etc

Locks that it is holding and method under which this lock is holding.

Method name which the thread is currently executing.

The structure of this thread dump differs from OS to OS and also application to application. As said above this dump is used to find mainly the lock contention and dead locks.

Lock Contention: In Simple words, if a thread is holding a lock for long time and another thread is waiting for this lock to get released then we call it as lock contention.

Deadlock: This is some thing father of Lock contention :) For example consider t1 is holding a lock for which t2 is waiting. And t2 is holding a lock for which t3 is waiting, and t3 is holding a lock for which t1 is waiting. Can you observe a chain over there. Now all 3 threads are waiting for a lock which is held by other thread and hence this state remains as it is for ever. Once the system comes to this state, everything will get stuck and application comes to an deadend. This is deadlock.

How can I get Thread dump?

I always work for Java/j2ee applications and hence let us see how thread dump is captured for JRocket JVM and Sun JVM.

NOTE: Important thing to be noted is, if you want to take a thread dump for a java process then make sure you don't use "-Xrs" in your JVM options.

BEA JRocket JVM:

JRocket JVM comes with an inbuild management console where an option is provided to get a thread dump. Even it gives all the thread locks and deadlocks in that dump at the end of display. Below is the exact process...

For detailed explaination of BEA Jrocket management console goto:

http://edocs.bea.com/jrockit/tools/usingjmc/start.html

Steps:

Start your JVM with -Xmanagement -Djrockit.managementserver.port=un-used port

Example: java -Xmanagement -Djrockit.managementserver.port=8989 testApplication

Start the JRocket Console using below command. This will open a console as displayed below

java -jar /console/ManagementConsole.jar

Click on connect and give necessary details and get connected to the JVM first.
Now you get the thread dump by clicking View --> "View Thread stack dump".

Now you will get the complete thread dump in a separate window which you can copy paste to a notepad.

As said earlier the beauty of this tools is, it will analyze the thread dump and gives the current lock contentions and deadlocks automatically with out breaking your head. The contention details that it gives is the max you can analyze even if you analyze manually.

Sun JVM:

If you are using jdk1.6 and above, then you are lucky enough to get dumps in faster way as below

Set the jdk's bin path to PATH variable.
Get the PID using command "jps".
And run " jstack -l PID"
Evan you can forward the dump to a file as "jstack -l PID > FileName"

If you are using Sun JVM ( less than jdk 1.6) then you need to send Signal QUIT to the jvm process so that it will print the thread dump.

Windows OS and process is running in console mode in command prompt.

In this case, just press "Ctrl+Break" in the process console. Then the complete thread dump is printed in the console log itself.

Windows OS and process is running as service

If the OS is windows and process is running as service we don't have the console where we can press Ctrl+Break. Follow below steps....

Download Sendsignal.exe from http://www.latenighthacking.com/projects/2003/sendSignal

Find the process ID of your process. You can get this from task manager.

Goto command prompt and execute below command...

sendSignal pid

This will print the thread dump in your application related logs. For example if you are using Oracle application server then the log will be printed in opmn.log. Just check all the logs where all the general errors are printed.

Unix Operating system.

If you are using Unix or Linux then the process is very simple. Just find the process id by using "ps -ef" command and execute below command

kill -3 pid

In this case also the thread dump is printed under the standard logging files.

Analysing thread dump is a simple yet power full in fidning many performance bottlenecks. Please find my another posting on thread dump anlaysis...

Basics of Garbage Collection in Java

In a short, garbage collection is nothing but cleaning up the memory. So the questions that raises in your mind are...

What does it clean?
Why does it clean?
How does it clean?
Why should I know this?

Now we will try to anwser all your questions below....

In Java all the dynamic objects that you create are allocated in the heap. Now if I am in a method main() and I create some objects say m1,m2,m3 in the main method and I will call the method First(). Let us assume this First method creates 2 objects f1 and f2. Now the heap and stack looks some thing like below...

Now every method has its own stack space where it creates all static objects and rest of the dynamic objects are created in heap and a pointer is maintained to the root set.

Root set is the one which keep track of all the the first level objects. Now The method's dynamic variable say f1 points to the root set address. The adress in the root set points to the heap adress where actualy the memory is allocated by the jvm to that object. From the above picture you can also there is an object f3 which is created by f1. The object in the root set are called as first level objects.

Now the actual story beguns... When we finish the work in the method First() and return back to Main(). All the reference to the root set are removed and eventualy the reference link from the root set to the objects in the heap also will be removed. Now the first question here is how will clean the heap as we have left out with 3 objects that are no more required. Now some one has to clean them out! This is where JVMs Garbage collecter will come in to picture... This is the wonder full feature that C or C++ doen't have. In the case of C we have to clean up the memory, technically we have to deallocate the memory that we have created, if not memory will be leaked. (I will talk about memory leak clearly next). Anyway nowdays we are getting some efficient memory allocater in the market where if you use them, they will take care of this garbage collection. Ok let us come back to java..!!

Finaly in Java, the memory cleanup or GC is no more a developers headack. So many people say there is no memory leak in java as GC will take care of all the cleanup automaticaly. Defenatly this is not true... Let us see in my furture blog in more detail...

Now the question araises... How the GC will come to know what to cleanup or exactly is there any algorithim which is used by GC. The answer is yes, Let us have deeper look in my later posts. For now just think, That the main rule of Garbae collecter (From now I will just call as GC) is: "Remove all unreferenced objects".

So let us consider the example again that I mentioned above. So like the First() method, many more method are called and the return back and hence the heap usage grows as heap is left out with many unused objects. Now at some point of time the heap will get full or reaches some threshold, then this GC thread will become active and then starts cleaning up the heap. This is how it work... It starts with each and every object in the root set and try to reach the objects in the heap and marks all the objects that it can reach. In this case we can reach only m1 and m2. Then it parses from the begining of the heap to the end to see all unmarked objects, if it finds any it will remove them. Thats it !!! very simple...

But it is not as simple as said above, it has an big algo which I will explain later. This was just to give an idea how basically GC works.

So final Question: Why should I know this when JVM does all automatically?

The answer is Yes you have to know what it does. The main reason I always say is Please make sure all GC is working efficintly. What I meant was, make sure it is not affecting your response time or using more resources. All the can be done by JVM tuning or GC tuning or Heap tuning.

Hope this gave a basic picture of gabage collector.

Don;t Forget to write coments please... If concepts are not clear then send a mail directly to me.

Using Onstat commands to tune Informix Database

We are going to see TOP 10 onstat commands used to tune Informix database

NOTE: All the below points may not hold well in all the conditions. The overall reason to write this document is to just guess some problems in a broader level. There are no fixed rules in interpreting all this data; it all depends on how your application behaves.

1. onstat -F : Page Flushers (cleaners)

The onstat -F command provides information on what types of buffer pool XE "buffer pool:flushing" flushing have occurred as well as the status of each of the system page cleaner threads.

NOTES:

onstat -F doesn’t print out as part of the onstat -a command.
FG writes may occur periodically from a load and do not necessarily indicate a need for additional page cleaners.

In general, foreground writes XE "foreground writes" (FG) should always be zero.A foreground write is caused when a session needs to have a page from disk placed into the buffer pool, but there are no clean/free buffers available.In this case, the sqlexec thread will pick a page off the least recently used side of an LRU queue, write it out, and mark the page as clean.This process is time consuming for a session and should be avoided.

Whether buffer flushing occurs as part of an LRU write or a Chunk Write should be based on the nature of the system.LRU writes do not block the system, but are random in nature.Chunk Writes block the system, but are sequential in nature and tend to be more efficient for large writes.It is therefore preferable to have LRU writes occuring in a production system for normal operations and Chunk Writes for loads.

If you are having long checkpoints, onstat -F may provide some insight.While a checkpoint is occurring, execute onstat -F -r 2.This will show what chunks the page flushers are writing out to disk every two seconds. By monitoring the data column, it is possible to determine which chunks are taking the longest to write. If one or more chunks continually have long chunk writes, relative to the rest of the system, the tables within it might be good candidates for redistribution.

Under normal circumstances, there will be one flusher assigned to each chunk during a checkpoint. LRU Writes and Chunk Writes will vary depending on the type of system you have.An OLTP system should maximize LRU Writes (Best value is 1:2 of chunk write: LRU writes).There will always be some Chunk Writes, but LRU Writes will speed up checkpoint duration.In a DSS system, Chunk Writes should be maximized.Some LRU Writes may still occur in an effort to eliminate foreground writes (Fg Writes).

Also monitor page cleaners (flushers) at checkpoint time.Make sure they are all busy doing Chunk Writes.

2: onstat -l : Logging

The onstat -l command provides a two-part output. The first part contains information on the physical log buffer and the physical log. The second part contains information on the Logical Log buffer and the Logical Logs.

Key Notes:

Don't confuse the Physical Log Buffer (in memory) with the Physical Log (on disk). Writes are made first to the Physical Log Buffer. The Physical Log Buffer is then flushed to the Physical Log on disk.
The Physical Log Buffer is flushed to disk when it hits 75%, during a checkpoint or when a flush of the Logical Log Buffer would place a transaction in the Logical Log without it's corresponding Physical Log image existing on disk.
The Physical Log on disk is flushed when it hits 75% by signaling that a checkpoint is required.
There are two Physical Log buffers and three Logical Log buffers.
The number of Physical Log buffers and Logical Log buffers cannot be modified.
An archive must be performed before a newly added Logical Log can become free and be used.
You may not create more than 32768 logical logs. However, the unique ID field has a maximum value of the size of an integer.

The Physical Log buffer should be tuned relative to how often you want it to flush to disk. Keep in mind that the larger it is, the more you could lose should a memory failure occur. The default is 32K and should not generally be tuned below this value.

The Physical Log on disk should be tuned based on two items: how often checkpoints should occur and how long fast recovery should take. Checkpoints are not driven just by the value of CKPTINTVL, but when the Physical Log hits 75% full. If checkpoints are occurring too frequently (i.e., in less than CKPTINTVL intervals), then the Physical Log may be too small. Oversizing the Physical Log won't hurt anything, but it will waste disk space. The Physical Log also drives the length of Fast Recovery. When the system is shut down normally, the Physical Log is logically marked as empty. If the system comes down abnormally, pages will remain in the Physical Log. During Fast Recovery, Informix Dynamic Server restores these pages. These pages must be re-written through the buffer pool and back to disk to synchronize the DBSpaces with the transaction logs. The smaller the Physical Log, the shorter the recovery time.

There are two items to address when sizing the Logical Logs: the size of each log and the number of logs. Since a Logical Log will not be backed up until it is full, the amount of transactional loss, should a disk failure occur, is directly affected by the size of each Logical Log. As an example, if each Logical Log is 100 Mbytes and a disk failure occurs while the third log is in use, only logs 1 and 2 will have been backed up, even if the third log is 99% full. Therefore, a single logical log should be sized to the amount of information you are willing to possibly loose should a complete disk failure occur. Keep in mind that during a restore, Informix Dynamic Server will attempt to save any logical logs that still exist and have not been backed up.

The number of Logical Logs required is a function of how frequently you wish to back up your logs. If you wish to back up your logs only once every 8 hours, then you should have sufficient log space to allow for transactions over an 8-hour period. This must be judged by monitoring the onstat -l output over a period of normal system behavior.

The length of transactions also affects the total size of all logs combined. If a single transaction spans more than the maximum percentage of log space allowed, given by LTXHWM (long transaction high water mark), it will be automatically rolled back. The transaction itself might not be large, however, if it is held open too long, relative to other operations, it could easily span too many logs. The default value for LTXHWM is 50(%). It is not recommended to increase this value since the rollback operation for a long transaction also consumes Logical Log space.

3. onstat -m : Message Log

The onstat -m command displays the last 20 lines from the Informix Dynamix Server log which is specified by the MSGPATH parameter in the configuration file.

Key Notes:

Checkpoint entries only appear in the log if pages were actually flushed to disk.
To handle server level errors, use the ALARMPROGRAM parameter in the configuration file.

The message log should be monitored periodically for potential system problems. Should a session abort with an assertion failure, the message log will contain important information including a stack trace of the session.

For checkpoints, the online log identifies the interval (time between checkpoints) and the duration (the length of the checkpoint). The interval should be no less than the value of CKPTINTVL unless you are using the size of your physical log to control checkpoints (checkpoints occur when the physical log is 75% full). The duration of the checkpoint will vary depending on many factors (e.g., BUFFERS , LRU_MIN_DIRTY, LRU_MAX_DIRTY, CKPTINTVL, CLEANERS).

Other factors can cause checkpoints besides CKPTINTVL. Booting or shutting down the system, creating a DBSpace, or performing an archive are a few.

The message log should be cleaned out from time to time as it can become rather large on a highly active system.

4. onstat -p : Profile statistics

The onstat -p command provides statistical information that has accumulated since either boot time or the last onstat -z command.

Key Notes:

The values for ovtbls and ovuserthread are leftovers from previous versions of OnLine. Starting with 7.x, the values for these fields are dynamically allocated and therefore the overflows should always be zero.
The number of flushes is not driven entirely by the number of checkpoints.
The value for commits and rollbacks is not necessarily reflective of actual transaction requests by sessions. Internally, Informix performs its own commits and rollbacks as well.
Actual deadlocks (two or more threads waiting on one another's locks) can't actually happen within a single Informix Dynamic Server instance. The thread that would create the deadlock fails and receives an error.
Page compression occurs when rows have been deleted from a page. The compression is actually the repositioning of the remaining rows on the page such that they are contiguous.

Cached reads and cached writes will usually be above 90%. If they are under 85% in an OLTP system, this may be an indication of a shortage of buffers. Although the %cached writes should be above 85%, the actual value can vary greatly. Low %cached writes may simply indicate that not many actual writes have occurred. This can be determined by examining the number of writes, rewrites, and deletes relative to the number of reads.

Bufwaits shows the number of times a thread had to wait to acquire a latch on a buffer. If the readahead values (RA_PAGES and RA_THRESHOLD) are too high, bufwaits may grow excessively. High bufwaits may also be caused by too small a buffer pool relative to the amount of work being performed (see %cached values).

Lock waits should normally be less than 1% of lock requests unless you have applications that hold locks for an inordinant amount of time. If locks waits are excessive, examine the isolation levels used by SQL statements plus the overall design of transaction management.

Checkpoint waits can and do occur in a high volume system. Checkpoints are prevented from occurring if a userthread is in the middle of a critical section (known as a critical write). If a checkpoint must wait, it is usually only for a brief moment.

The sequential scans field should monitored in an OLTP system since most OLTP system operate most efficiently using index reads.

Read aheads allow the system to perform more efficient I/O when a need arises to scan multiple sequential pages from disk. This capability is highly desired in DSS systems where large table and index scans are normal and in certain types of OLTP processing. In an ideal situation, ixda-RA + idx-RA + da-RA should equal RA-pgsused. However, if the values of RA_PAGES and RA_THRESHOLD are too high, the system may end up reading in more pages for a read ahead than it should. This will result in high bufwaits since the pages being read in will force existing buffer pool pages to be overwritten or, in the case of dirty pages, flushed out to disk.

5: onstat -R : LRU Queues

The onstat -R command displays the current status of the LRU queues. The buffer pool is broken into a series of queues to improve accessibility and reduce buffer contention, with each LRU queue being further broken down into a free/clean side (FLRU) and a modified side (MLRU).

Key Notes:

The onstat -R command is not included as part of the onstat -a output.
A free page is one that has not been used. A clean page is free page that has had a page written to it, but not modified. A dirty page is a clean page that has had modifications made to it.
Pages can migrate amongst the LRU queues. For this reason, the number queued and the total number of buffers may not always correspond. In addition, the number of buffer pages per LRU queue may become slightly unbalanced.
Each LRU queue is divided up into four separate priority queues: LOW, MED_LOW, MED_HIGH and HIGH.
Pages with a priority of HIGH are considered last for page replacement; pages with a priority of LOW are considered first for page replacement. Page replacement priorities are utilized when no free pages exist in the buffer pool.
Priorities are part of the Dynamic Buffer Management API incorporated into the server.

Upon startup, OnLine will divide the number of pages in the buffer pool among the LRU queues as evenly as possible. The LRU queues are designed to help reduce contention among active threads within the same OnLine instance. In addition, they assist in reducing the processing load required for a checkpoint. When the LRU_MAX_DIRTY percentage of pages for an LRU is exceeded, a page cleaner is assigned to the LRU to begin flushing dirty pages out to disk until only the LRU_MIN_DIRTY percentage of pages remains in the queue.

Setting the value for LRU_MIN_DIRTY and LRU_MAX_DIRTY should be driven by the length of the system checkpoints. If the checkpoint duration is too long, it can possibly be reduced by lowering the values for LRU_MIN_DIRTY and LRU_MAX_DIRTY. This will reduce the overall number of dirty pages remaining in the buffer pool.

Tuning the value for LRUS, which defaults to 8, depends on whether the tuning is being performed to reduce buffer contention or reduce checkpoint duration.

6: onstat -g glo : MT Global Information

The onstat -g glo command provides information on the status of the virtual processors in the system.

Gives status of the virtual processors in the system. You have to increase the No. of sessions for each pollthread, if total number of sessions = No. of pollthreads * No of sessions each poll thread can handle (check NETTYPE parameter in onconfig file).
Generally keep number of pollthreads equal to NUMCPUVPS. And NUMCPUS equal to number of CPUS - 1.

If there are more lngspins, then there is some congestion in the resources.

Also check the usage of each VP.

7: onstat -g iof : Disk IO Statistics by Chunk/File

The onstat -g iof command displays the statistics for disk I/O by chunk/file.

Examination of the values in each of the operations columns (totalops, dskread, dskwrite) can identify heavy I/Os against a particular device or chunk. This information can be used to determine where hotspots exist across the system. This data clearly explains how best the fragmentation is done.

8: onstat -g ioq : Disk IO Statistics by Queue

The onstat -g ioq command provides statistics on the disk I/O by queue.

One gfd queue name exists for each chunk in the system.

The information from the onstat -g ioq command can assist in identifying bottlenecks with the processing of I/O through the queues. If the maxlen column for the AIO queue tends to exceed 25 or len column for AIO queue tends to exceed 10, additional AIO VPs should be started.

The onstat -g ioq output for the global file descriptors (gfd) or chunks may also identify a potential I/O problem with specific disks in the system if certain gfd queues begin to back up. It may also point out a chunk that contains a table that needs better partitioning (fragmentation).
If the AIO queue is building very badly, the in most of the cases I prefer setting NUMAIOVPS = Number of chunks.

9: onstat -g iov : Disk IO Statistics by VP

The onstat -g iov command displays the statistics for disk I/O by virtual processor.

Although the column is called wakeups, it is actually incremented every time the AIO VP goes idle.

The greater the number of I/Os per wakeup, the busier the AIO VP is being kept. This can be used as an indicator for increasing or decreasing the number of VPs for AIO.
Generally io/wup should be <10>

10: onstat -g rea : Ready Threads

The onstat -g rea command displays the threads that are on the ready queue and are awaiting a VP on which to run.

Key Notes:

A thread priority can range from 1 (lowest) to 4 (highest) with a default priority of 2.
Threads are pulled off the ready queue based on their priority. Within a priority, threads are taken on a first-in-first-out basis (FIFO).
Ready threads are displayed from highest priority to lowest priority within each class of virtual processor (VP).

A consistent number of entries on the ready queue may be an indication that additional CPU VPs are required. The number of CPU VPs should never exceed the number of physical CPUs. If there is already a CPU VP for every CPU and there are threads backing up on the ready queue, there may not be enough physical processors or fast enough processors on the box based on the workload required. This ready queue also indicates buffer constraints.

Software Performance

Wednesday, August 13, 2008

Getting Hardware and OS Configurations

Tuesday, August 5, 2008

How to Get Thread dump?

Basics of Garbage Collection in Java

Using Onstat commands to tune Informix Database

Welcome

About Me

Blog Archive

Search

All Posts

SearchResults