Monday, June 2, 2014

ETL Basic Concepts and ETL Tools

ETL Concepts
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.
Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, and move) are sometimes used instead of ETL.
Glossary of ETL (Reference: www.Oracle.com)
Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
Mapping
The definition of the relationship and data flow between source and target objects.
Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
Staging Area
A place where data is processed before entering the warehouse.
Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a source to a data warehouse.
Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.



Figure 1.12: Sample ETL Process Flow




What are ETL Tools?

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.
There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.
Popular ETL Tools
Tool Name
Company Name
Informatica
Informatica Corporation
DT/Studio
Embarcadero Technologies
DataStage
IBM
Ab Initio
Ab Initio Software Corporation
Data Junction
Pervasive Software
Oracle Warehouse Builder
Oracle Corporation
Microsoft SQL Server Integration
Microsoft
TransformOnDemand
Solonde
Transformation Manager
ETL Solutions



ETL Tools: What to Learn?
With the help of ETL tools, we can create powerful target Data Warehouses without much difficulty. Following are the various options that we have to know and learn in order to use ETL tools.
Software:
» How to install ETL tool on server/client?

Working with an ETL Tool:
» How to work with various options like designer, mapping, workflow, scheduling etc.,?
» How to work with sources like DBMS, relational source databases, files, ERPs etc., and    import the source definitions?
» How to import data from data modeling tools, applications etc.,?
» How to work with targets like DBMS, relational source databases, files, ERPs etc., and    import the source definitions?
» How to create target definitions?
» How to create mappings between source definitions and target definitions?
» How to create transformations?
» How to cleanse the source data?
» How to create a dimension, slowly changing dimensions, cube etc.,?
» How to create and monitor workflows?
» How to configure, monitor and run debugger?
» How to view and generate metadata reports? 

Tuesday, May 28, 2013

Global Temporary Tables in Oracle

Global temporary table exists solely for your session or whose data persists for the duration of your transaction.

When you create a temporary table, you can specify whether
- it should last for the duration of your session (on commit preserve rows)
- or whether its rows should be deleted when the transaction completes (on commit delete rows)

Unlike a permanent table, a temporary table does not automatically allocate space when it is created. Space will be dynamically allocated for the table as rows are inserted:

create global temporary table EMP
(
    EMPNO NUMBER(4),
    ENAME VARCHAR2(9),
    SALARY NUMBER
)
on commit preserve rows;

 Global temporary tables are very useful in optimizing a oracle transaction basically a large PL/SQL block, where we need to hold a large set of data temporary.




Tuesday, May 21, 2013

Create External Table from CSV File

Follow these simple Steps 
 
1) Create a Directory
2) Grant Read/Write Permission to that Directory
3) Place your CSV file in that directory at OS Level
4) Create EXTERNAL Table

For Example: 
 
1) create or replace directory MYCSV as 'E:\test'; 
 Note: E:\test has to be physical location on disk.
 
2) grant read, write on directory MYCSV to scott;

3) Put your csv file on E:\test at OS level

4) create table ext_table
   (
     ENAME VARCHAR2(10),
     DNAME VARCHAR2(14),
     JOB VARCHAR2(9)
   )
   organization external
   (
     type ORACLE_LOADER
     default directory MY_DIR
     access parameters 
     (
       records delimited by newline
       fields terminated by "~"
     )
     location ('testfile.csv')
   )
   reject limit unlimited;

   
To create this CSV File you may use following command

set pagesize 0 newpage 0 feedback off

select ENAME || '~' || DNAME || '~' || JOB from SCOTT.EMP, SCOTT.DEPT 
WHERE EMP.DEPTNO=DEPT.DEPTNO;

spool E:\test\testfile.csv
/
spool off


Oracle Default Username and Password

Here is the complete List of Default username and passwords

User Name Default Password Script (in $ORACLE_HOME) Additional Information
ANONYMOUS ANONYMOUS /rdbms/admin/catqm.sql Allows HTTP access to Oracle XML DB.
APEX_PUBLIC_USER APEX_PUBLIC_USER /apex/apexins.sql The Oracle Database Application Express account
BI CHANGE_ON_INSTALL / BI /demo/schema/bus_intelligence/bi_main.sql Sample Schema
CTXSYS CHANGE_ON_INSTALL / CTXSYS /ctx/admin/ctxsys.sql To administer Oracle Text
DBSNMP DBSNMP /rdbms/admin/catsnmp.sql Used by the Management Agent of OEM to monitor and manage DB
DIP DIP /rdbms/admin/catdip.sql for Oracle Internet Directory-enabled Oracle Label Security
EXFSYS EXFSYS /rdbms/admin/exfsys.sql Used internally and enables us to build complex PL/SQL rules and expressions
FLOWS_FILES FLOWS_FILES /apex/apexins.sql Owns objects created during installation of Oracle Database Application Express
HR CHANGE_ON_INSTALL / HR /demo/schema/human_resources/hr_main.sql Sample Schema
IX CHANGE_ON_INSTALL / IX /demo/schema/info_exchange/ix_main.sql Sample Schema
MDDATA MDDATA /md/admin/catmd.sql Used by Oracle Spatial Feature
MDSYS MDSYS /ord/admin/ordinst.sql The Oracle Spatial and Oracle Multimedia Locator administrator account
MGMT_VIEW MGMT_VIEW /sysman/admin/emdrep/bin/RepManager Used by Oracle Enterprise Manager Database Control
OE CHANGE_ON_INSTALL / OE /demo/schema/order_entry/oe_main.sql Sample Schema
OLAPSYS MANAGER / OLAPSYS /olap/admin/amdsys.sql Owns the OLAP Catalog, has been deprecated
ORACLE_OCM ORACLE_OCM /rdbms/admin/catocm.sql Associate the configuration information of Database instance with OracleMetaLink
ORDPLUGINS ORDPLUGINS /ord/admin/ordinst.sql Enables Oracle Database to store, manage and retrieve heterogeneous media data
ORDSYS ORDSYS /ord/admin/ordinst.sql Oracle Multimedia administrator account
OUTLN OUTLN /rdbms/admin/sql.bsq (after recovery) Provides plan stability prevents certain db environment changes from affecting performance characteristics
OWBSYS OWBSYS /owb/UnifiedRepos/cat_owb.sql For administrating the Oracle Warehouse Builder repository
PM CHANGE_ON_INSTALL / PM /demo/schema/product_media/pm_main.sql Sample Schema
SCOTT TIGER /rdbms/admin/utlsampl.sql Sample Schema
SH CHANGE_ON_INSTALL / SH /demo/schema/sales_history/sh_main.sql Sample Schema
SI_INFORMTN_SCHEMA SI_INFORMTN_SCHEMA /ord/admin/ordinst.sql Stores the information views for the SQL/MM Still Image Standard
SPATIAL_CSW_ADMIN_USR SPATIAL_CSW_ADMIN_USR /md/admin/sdocswpv.sql Used by Oracle Spatial CSW Cache Manager to load record-type metadata andrecord instances
SPATIAL_WFS_ADMIN_USR SPATIAL_CSW_ADMIN_USR /md/admin/sdowfspv.sql Used by Oracle Spatial WFS Cache Manager to load all feature type metadata and feature instances
SYS CHANGE_ON_INSTALL /rdbms/admin/sql.bsq (after recovery) An account used to perform database administration tasks
SYSMAN OEM_TEMP /rdbms/admin/sql.bsq (after recovery) The account used to perform Oracle Enterprise Manager administration tasks
SYSTEM MANAGER /rdbms/admin/sql.bsq (after recovery) Default generic database administrator account
WMSYS WMSYS /rdbms/admin/owmctab.plb Used to store the metadata information for Oracle Workspace Manager.
XDB CHANGE_ON_INSTALL / XDB /rdbms/admin/catqm.sql Provides high-performance XML storage and retrieval
XS$NULL
/rdbms/admin/sql.bsq (after recovery) Represents the absence of a user in a session


Monday, May 6, 2013

Documents that may help you get a job



I have 4 documents that may help you get a job.

1) A checklist with 160 items.
The items are divided into these categories:
• Marketing yourself
• Keeping track of the places you submit your resume to
• Preparing for the interview
• Immediately before the interview
• During the interview
• Immediately after the interview
• After a rejection letter
• After a offer letter
• Business cards and signature block for all correspondence

2) an outline of important things to say and do at the interview.
A document to take to the interview. There are 63 items on this outline. The items are divided into these categories:
• Tell me about yourself
• The top 3 points you want to make sure you get across.
• What makes me different:•
• Job related questions to ask at the beginning of the interview. For example:
• Job related questions to ask at the end of the interview. For example:
• My Strengths:
• My Weaknesses:
• Future employment goals
• Gap in work
• Elevator speech

3) An excel spreadsheet used to keep track of:
• The places you have applied for work at
• Each time you have communicated with those places.
• What was said and if it was an email, phone call, in person interview, phone interview etc.
• The phone number, address etc. of the head hunters and employers

4) A document is all about cover letters.
Fallow up letters, letters of introduction, post interview letter, post rejection latter etc