ETL - Informatica: March 2011

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune the performance of a session, first you identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by following these guidelines:
♦Eliminate source and target database bottlenecks. Have the database administrator optimize database performance by optimizing the query, increasing the database network packet size, or configuring index and key constraints.
♦Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks.
♦Eliminate session bottlenecks. You can optimize the session strategy and use performance details to help tune session configuration.
♦Eliminate system bottlenecks. Have the system administrator analyze information from system monitoring tools and improve CPU and network performance.
If you tune all the bottlenecks above, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.
Because determining the best way to improve performance can be complex, change only one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to your original configurations.

----------------------------------------------------------------------

Identifying the Performance Bottleneck
The first step in performance tuning is to identify the performance bottleneck. Performance bottlenecks can occur in the source and target databases, the mapping, the session, and the system. Generally, you should look for performance bottlenecks in the following order:
1.Target 2.Source 3.Mapping 4.Session 5.System

You can identify performance bottlenecks by running test sessions, viewing performance details, and using system monitoring tools.

1) Identifying Target Bottlenecks:
The most common performance bottleneck occurs when the PowerCenter Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
If your session already writes to a flat file target, you probably do not have a target bottleneck. You can optimize session performance by writing to a flat file target local to the PowerCenter Server.
Causes for a target bottleneck may include small check point intervals, small database network packet size, or problems during heavy loading operations.

2) Identifying Source Bottlenecks
Performance bottlenecks can occur when the PowerCenter Server reads from a source database. If your session reads from a flat file source, you probably do not have a source bottleneck. You can improve session performance by setting the number of bytes the PowerCenter Server reads per line if you read from a flat file source.
If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks.
Using a Filter Transformation
You can use a filter transformation in the mapping to measure the time it takes to read source data. Add a filter transformation in the mapping after each source qualifier. Set the filter condition to false so that no data is processed past the filter transformation. If the time it takes to run the new session remains about the same, then you have a source bottleneck.
Using a Read Test Session
You can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping. Use the following steps to create a read test mapping:
1.Make a copy of the original mapping.
2.In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.
3.Remove all transformations.
4.Connect the source qualifiers to a file target.
Use the read test mapping in a test session. If the test session performance is similar to the original session, you have a source bottleneck.
Using a Database Query
You can identify source bottlenecks by executing the read query directly against the source database.
Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX systems, you can load the result of the query in /dev/null.
Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time measurements, you can use an optimizer hint to eliminate the source bottleneck.
Causes for a source bottleneck may include an inefficient query or small database network packet sizes.

3) Identifying Mapping Bottlenecks
If you determine that you do not have a source or target bottleneck, you might have a mapping bottleneck. You can identify mapping bottlenecks by using a Filter transformation in the mapping.
If you determine that you do not have a source bottleneck, you can add a Filter transformation in the mapping before each target definition. Set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, you have a mapping bottleneck.

You can also identify mapping bottlenecks by using performance details. High errorrows and rowsinlookupcache counters indicate a mapping bottleneck. High Rowsinlookupcache Counters
Multiple lookups can slow down the session. You might improve session performance by locating the largest lookup tables and tuning those lookup expressions.

4)Identifying a Session Bottleneck
If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The PowerCenter Server creates performance details when you enable Collect Performance Data in the Performance settings on the Properties tab of the session properties.
Performance details display information about each Source Qualifier, target definition, and individual transformation. All transformations have some basic counters that indicate the number of input rows, output rows, and error rows.
Any value other than zero in the readfromdisk and writetodisk counters for Aggregator, Joiner, or Rank transformations indicate a session bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.
If a session contains Aggregator, Rank, or Joiner transformations, examine each Transformation_readfromdisk and Transformation_writetodisk counter. If these counters display any number other than zero, you can improve session performance by increasing the index and data cache sizes. The PowerCenter Server uses the index cache to store group information and the data cache to store transformed data, which is typically larger. Therefore, although both the index cache and data cache sizes affect performance, you will most likely need to increase the data cache size more than the index cache size.

Informatica Distributed Services Architecture:

One of the major differences between the two versions is that PowerCenter 7 had a client-server architecture whereas PowerCenter 8 has a service-oriented architecture.

�� Domains and nodes: PowerCenter 8.0 allows you to scale services and share resources across multiple machines. PowerCenter introduces a PowerCenter domain, the primary unit of administration for the PowerCenter environment, and a PowerCenter node, the logical representation of a machine in a domain.
�� Services: PowerCenter provides application and core services. Application services represent PowerCenter server-based functionality such as the Repository Service,Integration Service, and SAP BW Service. Core services support the domain and application services. Core services include a Log Service, Licensing Service, and Domain Service.
�� High availability: You can use the high availability option to eliminate single points of failure in the PowerCenter environment and reduce service interruptions in the event of failure. High availability provides resilience, failover, and recovery for services.
�� Administration: PowerCenter provides centralized administration through the Administration Console. You perform all administration tasks for the domain and services in the Administration Console. This includes administration of the Repository Service.

Command Line Programs:
�� infacmd. infacmd is a new command line program that allows you to administer PowerCenter domains and services. You can perform tasks such as enabling and disabling services and purging log events.
�� infasetup. infasetup is a new command line program that allows you to configure domains and nodes. Use infasetup to define domains and nodes, join domains, and configure node passwords, domain licenses, and domain service ports.
�� pmcmd. pmcmd includes new commands to support Integration Service functionality and new syntax to connect to a domain.
�� pmrep. pmrep now includes former pmrepagent commands and new syntax to connect to a domain.

Datatypes:
�� Flat file definitions: You can configure flat file definitions to use the Integer or Double datatype.

Performance Improvements:
�� Pushdown optimization: The Integration Service can execute SQL against the source or target database instead of processing the transformation logic within the Integration Service.
�� Partitioned flat file targets: The Integration Service can create an individual target file for all partitions, a merge file for all partitioned flat file target instances, and a file list that references the individual target files for all target partitions.
�� Append to flat file targets: The Integration Service can append data to existing flat file target and reject files.
�� Flat file source and target commands: You can specify a command for any flat file source or target instance in a session. You can use a command to generate source data and you can use a command to process target data.
�� Data compression" Repository data is compressed during communication among the Repository Service, Integration Service, and PowerCenter Client.

ETL - Informatica

Pages

Sunday, March 6, 2011

Performance Tuning

Informatica PowerCenter 8.0 New Features and Enhancements