Data
Flow Task
that literally means a task where DATA flows. That’s exactly what ETL does,
Extract Transform & Load of data. Most of the ETL is done in the Data Flow
Task.
Destinations:
Just remember the way we connect the tasks in Control Flow with the use of precedence constraints. Similarly, we use arrows in DFT but they are not termed as precedence constraints. Here they are available in just two colours: Green for valid records and Red in case the data is incorrect as per our setup.
Unlike Control Flow in Data Flow, the various transforms cannot have any number of inputs or outputs from them. They are predefined barring the Multicast Transform which can have any n number of outputs configured. OLEDB Source takes no input and gives one success (green) and one error output (red), we have similar rule for each source, transform and destination.
Steps for adding Data Flow
A
Data Flow Task has to be a part of a ‘Control Flow’. One control flow can have
a number of DFTs in it, as the Control Flow is the place where we can decide
the ways the multiple DFTs will be placed in our solution.
Each
DFT has to have a Source, may or may not have a Transform, may or may not have
a Destination. However a DFT needs to have at least a transform or Destination
followed by the Source. Source cannot just get data from somewhere and do
nothing with it. If that’s the case, DFT will fail.
Ideally,
DFT has a Source, Transform (optional) and a Destination. Source is used
to extract the data, Transform is used to transform the data while Destination
is used to Load the data to a destination. That's how ETL is implemented in
DFT.
The Data Flow tab of the Package Designer
is where you specify the details of any Data Flow tasks that you've added on
the Control Flow tab. Data Flows are made up of various objects that you drag
and drop from the Toolbox:
- Data Flow Sources are ways that data gets into the system. Table 16-5 lists the available data flow sources.
- Data Flow Transformations let you alter and manipulate the data in various ways. Table 16-6 lists the available data flow transformations.
- Data Flow Destinations are the places that you can send the transformed data. Table 16-7 lists the available data flow destinations.
The
way the Control Flow contains tasks, DFT contains Source,Transforms and
destinations and they are not called Tasks as many people call them. Let list
down a few examples from each:
Data Source:
Source
|
Use
|
ADO NET
|
Extracts data from a database using a .NET data
provider
|
Excel
|
Extracts data from an Excel workbook
|
Flat File
|
Extracts data from a flat file
|
OLE DB
|
Extracts data from a database using an OLE DB
provider
|
Raw File
|
Extracts data from a raw file (proprietary
Microsoft format)
|
XML
|
Extracts data from an XML file
|
Transformations:
Transformation
|
Effect
|
Aggregate
|
Aggregates and groups values in a dataset
|
Audit
|
Adds audit information to a dataset
|
Cache Transform
|
Populates a CACHE connection manager
|
Character Map
|
Applies string operations to character data
|
Conditional Split
|
Evaluates and splits up rows in a dataset
|
Copy Column
|
Copies a column of data
|
Data Conversion
|
Converts data to a different datatype
|
Data Mining Query
|
Runs a data mining query
|
Derived Column
|
Calculates a new column from existing data
|
Export Column
|
Exports data from a column to a file
|
Fuzzy Grouping
|
Groups rows that contain similar values
|
Fuzzy Lookup
|
Looks up values using fuzzy matching
|
Import Column
|
Imports data from a file to a column
|
Lookup
|
Looks up values in a reference dataset
|
Merge
|
Merges two sorted datasets
|
Merge Join
|
Merges data from two datasets by using a join
|
Multicast
|
Creates copies of a dataset
|
OLE DB Command
|
Executes a SQL command on each row in a dataset
|
Percentage Sampling
|
Extracts a subset of rows from a dataset
|
Pivot
|
Builds a pivot table from a dataset
|
Row Count
|
Counts the rows of a dataset
|
Row Sampling
|
Extracts a sample of rows from a dataset
|
Script Component
|
Executes a custom script
|
Slowly Changing Dimension
|
Updates a slowly changing dimension table
|
Sort
|
Sorts data
|
Term Extraction
|
Extracts data from a column
|
Term Lookup
|
Looks up the frequency of a term in a column
|
Union All
|
Merges multiple datasets
|
Unpivot
|
Normalizes a pivot table
|
Destinations:
Destination
|
Use
|
ADO NET
|
Sends data to a .NET data provider
|
Data Mining Model Training
|
Sends data to an Analysis Services data mining
model
|
DataReader
|
Sends data to an in-memory ADO.NET DataReader
|
Dimension Processing
|
Processes a cube dimension
|
Excel
|
Sends data to an Excel worksheet
|
Flat File
|
Sends data to a flat file
|
OLE DB
|
Sends data to an OLE DB database
|
Partition Processing
|
Processes an Analysis Services partition
|
Raw File
|
Sends data to a raw file
|
Recordset
|
Sends data to an in-memory ADO Recordset
|
SQL Server Compact
|
Sends data to a SQL Server CE database
|
SQL Server
|
Sends data to a SQL Server database
|
Just remember the way we connect the tasks in Control Flow with the use of precedence constraints. Similarly, we use arrows in DFT but they are not termed as precedence constraints. Here they are available in just two colours: Green for valid records and Red in case the data is incorrect as per our setup.
Unlike Control Flow in Data Flow, the various transforms cannot have any number of inputs or outputs from them. They are predefined barring the Multicast Transform which can have any n number of outputs configured. OLEDB Source takes no input and gives one success (green) and one error output (red), we have similar rule for each source, transform and destination.
Steps for adding Data Flow
No comments:
Post a Comment