Data Mining Assignment

                           Assignment # 01

Q1) Explain Data , Information , Knowledge and Intelligence ? 

A1) Data: Data are the raw and unorganized facts that need to be process.

     Information: When data is processed , organised structured in given contest to make it useful or meaningful.

Knowledge: It is expertise to inform the results from information you have obtain.

Intelligence: It is defined as the ability to solve complex problems or make decisions with outcomes benefiting the actor.

--------------------

Q2) What is Data Mining ? Explain major issue Data Mining.

A2) Data Mining can be defined as extraction of useful information from a large amount of data.

The another term for data mining is KDD ( Knowledge discovery in data ) .

Issue In Data Mining:-

Data mining is not an easy task, as the algorithms used can get very complex and data is not always available at one place. It needs to be integrated from various heterogeneous data sources. These factors also create some issues are as follows :- 

* Mining Methodology and User Interaction

* Performance Issues

* Diverse Data Types Issues

----------------------

Q3) What are the different forms of Data Pre-Processing ?

A3) Different forms of Data Pre-Processing are :-

1. Data Cleaning - It cleans the data by filling the missing values , smoothing data , resolving the inconsistency and removing the outliers.

2. Data Transformation -  It is a technique that transform the data into alternate forms appropriate for mining technique.

3. Data Reduction-  Data reduction is the process of reducing the amount of capacity required to store data. Data reduction can increase storage efficiency and reduce costs.

4. Data Integration -  Data integration is the process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL mapping, and transformation. 

-----------------------

Q4) What do you mean by Noisy data ? How to handle it.

A4) It is a random error or varience in a measured variable or data. Noisy is unwanted data items. It is meaningless data or corrupt data. Any data that cannot be understood or interrupted correctly by a system.

Technique of Remove Noisy data :-

1. Binning - It is a technique for reducing the cardinality of continuous and discrete data.

2. Regression - It is a data mining technique which is used to fit an equation to a data set. 

3. Clustering - Group or clusters are formed from the data having similar characteristics.

---------------

Q5) What are the different technique to handle missing values.

A5) Ways to handle missing data during data cleaning -

1. Manual Entry of missing data

2. Using Attribute Mean

3. Using Global Constant

4. Ignore Table

5. Using Most probal value

---------------

Comments

Popular posts from this blog

Python Notes for B-Tech , Bsc , Bca....

Software Engineering Full Notes... B-Tech (C.S.) , Bsc (CS) , BCA....