Posts

What is Data Quality

Data quality can be defined as the degree to which a set of characteristics of data fulfills requirements. Examples of characteristics are:  completeness  validity  accuracy,  consistency,  availability  timeliness. Requirements are defined as the need or expectation that is stated, generally implied or obligatory. However, there are also other standards related to data quality. For example,  ISO 25012  defines 15 quality dimensions of the data. [11] Quantity and names of the quality dimensions (characteristics) can also depend on the source of data. For instance, for data in  Web 2.0  documents following dimensions can be defined:: accessibility, completeness, credibility, involvement, objectivity, readability, relevance, reputation, style, timeliness, uniqueness, usefulness. [1]

Exception Transformation

Bad Record Exception Duplicate Record Exception

Match Transformation

A Match Output System Ports  - When Clusters selected  ClusterID GroupKey ClusterSize RowID  -- Source Record ID. DriverID   -Maximum record ID in that Group DriverScore   --- Maximum RowID in Group act as a Trigger record LinkID  -- Minimum Record ID in that Group LinkScore  -- Confidence Level -- in link score lowest RowID in Group act as a Trigger record System Ports  - When Matched Pair is selected  GroupKey RowID RowID1 DriverScore Match Type Field Match(Single Source) Strategies Field Matching Bigram Distance Edit Distance Hamming Distance Jaro Distance Reverse Hamming Distance Identity Matching

Merge Transformation

Its a passive transformation Is used to concatenate fields.

IDQ - Parser Transformation

You can use regex to validate/Standardize the data.

Intorduction to IDQ

Introducing Informatica Analyst  Informatica Analyst is a web-based application client that analysts can use to analyze, cleanse, standardize, profile, and score data in an enterprise. Depending on your license, business analysts and developers use the Analyst tool for data-driven collaboration. You can perform column and rule profiling, scorecarding, and bad record and duplicate record management. You can also manage reference data and provide the data to developers in a data quality solution.

Informatica Domain

Informatica has a service-oriented architecture that provides the ability to scale services and to share resources across multiple machines. The Informatica domain is the primary unit for the management and administration of services. You can log in to Informatica Administrator after you install Informatica. You use the Administrator tool to manage the domain and configure the required application services before you can access the remaining application clients. Application Client Application Services Repositories Informatica Analyst - Analyst Service Model repository - Content Management Service - Data Integration Service - Model Repository Service - Search Service Informatica Developer - Analyst Service Model repository - Content Management Service - Data Integration Service - Model Repository Service...