1 (a)
Define KDD. How data mining techniques applied over multimedia database,
temporal database and spatial database to extract useful knowledge.
7 M
1 (b)
What is concept hierarchy? List and explain types of concept hierarchy in detail.
7 M
2 (a)
What is data cleaning? Discuss various ways of handling missing values during
data cleaning.
7 M
2 (b) (i)
Explain Star and Fact Galaxy schemas used in data warehouse for
multidimensional database.
3 M
2 (b) (ii)
Differentiate OLAP vs. OLTP
4 M
2 (c) (i)
What is Cuboid? Explain various OLAP operations on data cube with
suitable example.
3 M
2 (c) (ii)
Differentiate Fact table vs. Dimension table.
4 M
3 (a)
Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
ii) Use z-score normalization to transform the value 45 for age, where the standard deviation of age is 20.64 years.
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
ii) Use z-score normalization to transform the value 45 for age, where the standard deviation of age is 20.64 years.
7 M
3 (b)
State the Apriori Property. Generate large itemsets and association rules using
Apriori algorithm on the following data set with minimum support value and
minimum confidence value set as 50% and 75% respectively.
TID | Items Purchased |
T101 | Cheese,Milk ,Cookies |
T102 | Butter,Milk,Bread |
T103 | Cheese,Butter,Milk,Bread |
T104 | Butter,Bread |
7 M
3 (c)
What is noise? Explain data smoothing methods as noise removal technique to
divide given data into bins of size 3 by bin partition (equal frequency), by bin
means, by bin medians and by bin boundaries. Consider the data: 10, 2, 19, 18,20, 18, 25, 28, 22
7 M
3 (d)
List two shortcomings of the algorithms which helped in improving the efficiency
of Apriori algorithm. Discuss any TWO variations of the Apriori algorithm to
improve the efficiency.
7 M
4 (a)
How K-Mean clustering method differs from K-Medoid clustering method?
Discuss the process of K-Mean clustering. Also outline major drawbacks of K-Mean clustering technique.
7 M
4 (b)
Explain how the accuracy of a classifier can be measured. How Bagging strategy
helps improving the classifier accuracy?
7 M
4 (c)
What is supervised learning? Using the given table, show how the ROOT
splitting attribute is selected using InfoGain measure in the overall process of decision tree induction.
Attributes | |||||
No. | Outlook | Temperature | Humidity | Windy | Class |
1 | Sunny | Hot | High | FALSE | N |
2 | Sunny | Hot | High | TRUE | N |
3 | Overcast | Hot | High | FALSE | P |
4 | Rain | Mild | High | FALSE | P |
5 | Rain | Cool | Normal | FALSE | P |
6 | Rain | Cool | Normal | TRUE | N |
7 | Overcast | Cool | Normal | TRUE | P |
8 | Sunny | Mild | High | FALSE | N |
9 | Sunny | Cool | Normal | FALSE | P |
10 | Rain | Mild | Normal | FALSE | P |
11 | Sunny | Mild | Normal | TRUE | P |
12 | Overcast | Mild | High | TRUE | P |
13 | Overcast | Hot | Normal | FALSE | P |
14 | Rain | Mild | High | TRUE | N |
7 M
4 (d)
Explain Linear Regression and Non-linear Regression techniques of prediction.
7 M
5 (a)
What is web log? Explain web structure mining and web usage mining in detail.
7 M
5 (b)
Discuss the application of data warehousing and data mining in government sector.
7 M
5 (c)
Explain the information retrieval methods used in text mining.
7 M
5 (d)
What are neural networks? Describe the various factors which make them useful
for classification and prediction in data mining. Explain how the topology of
neural network is designed.
7 M
More question papers from Data Warehousing And Data Mining