Posts

Showing posts from June, 2019

Exception Transformation

Bad Record Exception Duplicate Record Exception

Match Transformation

A Match Output System Ports  - When Clusters selected  ClusterID GroupKey ClusterSize RowID  -- Source Record ID. DriverID   -Maximum record ID in that Group DriverScore   --- Maximum RowID in Group act as a Trigger record LinkID  -- Minimum Record ID in that Group LinkScore  -- Confidence Level -- in link score lowest RowID in Group act as a Trigger record System Ports  - When Matched Pair is selected  GroupKey RowID RowID1 DriverScore Match Type Field Match(Single Source) Strategies Field Matching Bigram Distance Edit Distance Hamming Distance Jaro Distance Reverse Hamming Distance Identity Matching

Merge Transformation

Its a passive transformation Is used to concatenate fields.

IDQ - Parser Transformation

You can use regex to validate/Standardize the data.

Intorduction to IDQ

Introducing Informatica Analyst  Informatica Analyst is a web-based application client that analysts can use to analyze, cleanse, standardize, profile, and score data in an enterprise. Depending on your license, business analysts and developers use the Analyst tool for data-driven collaboration. You can perform column and rule profiling, scorecarding, and bad record and duplicate record management. You can also manage reference data and provide the data to developers in a data quality solution.

Informatica Domain

Informatica has a service-oriented architecture that provides the ability to scale services and to share resources across multiple machines. The Informatica domain is the primary unit for the management and administration of services. You can log in to Informatica Administrator after you install Informatica. You use the Administrator tool to manage the domain and configure the required application services before you can access the remaining application clients. Application Client Application Services Repositories Informatica Analyst - Analyst Service Model repository - Content Management Service - Data Integration Service - Model Repository Service - Search Service Informatica Developer - Analyst Service Model repository - Content Management Service - Data Integration Service - Model Repository Service...

IDQ - Analyst Tool

URL is http://servername:port/AnalystTool/

IDQ - Data Profiling

To Analyze and Understand the Data. It is done using IDQ Analyst Web Client. Helps you understand the Data Quality and configure the Trust values accordingly in Informatica MDM Data Quality and Profiling Use the data quality capabilities in the Developer tool to analyze the content and structure of your data. You can enhance the data in ways that meet your business needs. Use the Developer tool to design and run processes that achieve the following objectives: Profile data. Profiling reveals the content and structure of your data. Profiling is a key step in any data project as it can identify strengths and weaknesses in your data and help you define your project plan. Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in a profile. Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, an...

IDQ Architecture

4 Tiers Client Layer Informatica Analyst Web-Based Thin Client  Informatica Developer Eclipse Base Thick Client Application Server Layer Data Quality Server  (DIS) Meta Layer Informatica Repository(MRS) Content Layer Address Doctor Reference Data.

IDQ - Consolidation Transformation

Characteristics Creates Unique/Mater Records. Is used to Dedupe the duplicate data. Sometimes these unique records are referred to as Golden records. Used almost every time after the Exact Match or Fuzzy Match Transformation. Work on a Group By Key  that you define, a key which  identifies Unique Record like customer ID Feeds as input to Human Task access byIDQ analyst tool System Ports IsSurvivor N- NonSurviving Record Y- Master Record Properties Advanced - Output Mode - All  - means Input and consolidated data will be passed to next transformation Advanced - Output Mode - Survivor Only  - means only Consolidate data will be passed to the next transformation. Strategies Simple Highest Row ID - is the default maximum minimum longest shortest most frequent most frequent non-null average Row Based Most Data -- the length of full record/row Most Filled -- Leat amount of Null/Blanks Model Exact - Most Frequent Non Blanks ...

Address Doctor/Validator Transformation

How To Configure Prerequisites Access to the Admin Console to create services Address Doctor Reference  Files(*.MD Files) Address Doctor License Key Steps Unzip any Zipped Reference Data  -Address Doctor reference files looks like DB5_USA_5B1_140401 Copy Unzipped reference files on to server Directory. Create the Content Management Service (CMS) in the Admin Console. Provide Reference Data file directory location under CMS properties. Provide Address Doctor License Key under CMS Properties. note- sperate with a comma if you have more than 1 key Specify preloading options under CMS (usually all) Use the rest of the properties as Default or Blank Restart CMS and DIS (DIS) services.  Characteristics You need a separate license It's not installed Out of Box and needs to be installed and configured in IDQ separately. Is used to Standardize Address. Improves return mails and thus decreases cost. Give status code(metadata) about that particu...

Labeler and Standardizer Transformation

Both are Used for Data Cleansing and Standardization for Addresses, Company Names(like LTD for limited, TAS  Trading As,). Labeler Characteristics Labels Different Incoming values e.g. #$^& as Symbols or S 017242 as 99999 Properties/Strategies Label using Reference Table Label Using Character Set. Standardizer Characteristics Used for Standardizing the Data e.g. AVE or AVE or AVNUE to AVENUE Properties/Strategies Offers Removal of Spaces. Label Using Character Set. Remove Reference Table Match Remove Custome String Replace Reference Table Matches with Valid Values Replace Reference Table Matches with Custome Values Replace Custome Strings.

KeyGen Transformation

Characteristics Is used in Conjunction with Match Transformation, where Match Transformation needs some ports from KeyGen. Based on the strategy Used, it will create Key which Match Transformation can use to compare data. All the records with the same Keys will be compared by Match Transformation, this leads to better performance of Match  Transformation. For e.g. All names Like Ramit Girdhar and Remit Girdhar and Ramit Gerdhar will have the same key generated by Keygenerator if we use Soundex. Now match transformation only have to compare records with the same Key. Always Generate the Same Key for Same Values Also, used in scenarios where an incoming source doesn't have a Key. Different Strategies to Generate Keys String Just concatenation of Eaxct Strings Soundex Uses Phenotic to generate Keys, Fuzzy Matching. NYSIS NYSIS consider all vowels in the string as compared to Soundex which considers only 1st vowel in the string. It Converts all letters t...

IDQ Transformations

AddressValidator Aggregator Association Case Convertor Classifier Comparison Consolidation Data Masking Data Processor Decission Exception Expression Filter Java Joiner Key Generator Labeler Lookup Match Merge Parser Rank

IDQ Functions

Text/String Functions INITCASE SUBSTR Logical ISNULL NOT

Glossary

Informatica Developer - Eclipse-based GUI for developing IDQ mappings DO    -  Data Object - is nothing but Source/Target Definition PDO  - Physical Data Object

Use Cases of IDQ

To Standardize Gender like  M,F,Male,MaLe, F,Female,Unkown,Unk,Null,NULL to Possibly 3 values MALE,FEMALE,UNKNOWN To Convert Name to Upper Case, Lower Case or CamelCase (InitCap) SubString Zip Code to 5 characters. Filter Null

Why IDQ over Power Center

Data Preview at any step of mapping Match and Consolidator Transformation for Fuzzy Matching to remove duplicates Additional transformation for Data Quality and Cleansing. Mainly for data cleansing Much Faster Development and Unit Testing Process. Easy Integration with Power Center and MDM

IDQ service & Main Components

Model Repository Services (MRS) -  is similar to the Repository Manager in Power Center Data Integration Services (DIS) Analyst Service (AS) Human Task (HT)  for things which cannot be automated, like Merge data where System is not able to determine a match or link