1 (a)
Explain different OLAP operation with example.
7 M
1 (b) (i)
What are the major challenges of mining a huge amount of data in
comparison with mining a small amount of data?
4 M
1 (b) (ii)
Why strong association rule is not always interesting? Explain with example.
3 M
2 (a)
Suppose that a data warehouse consists of the three dimensions time, doctor, and
patient, and the two measures count and charge, where charge is the fee that a
doctor charges a patient for a visit.
1) Draw a star schema diagram for the data warehouse.
2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?
1) Draw a star schema diagram for the data warehouse.
2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?
7 M
2 (b)
Define sampling. Explain different type of sampling techniques with example.
7 M
2 (c)
What is noise? Explain the different techniques to remove the noise from data.
7 M
3 (a)
How to compute the dissimilarity between objects described by the following
types of variables:
1) Interval-scaled variables
2) Asymmetric binary variables
3) Categorical variables.
1) Interval-scaled variables
2) Asymmetric binary variables
3) Categorical variables.
7 M
3 (b)
How multilevel association rules can be mined efficiently using concept hierarchy?
7 M
3 (c)
Suppose that the data mining task is to cluster the following eight points (with (x,
y) representing location) into three clusters:
A 1 (2, 10), A 2 (2, 5), A 3 (8, 4), B 1 (5, 8), B 2 (7, 5), B 3 (6, 4), C 1 (1, 2), C 2 (4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1 , B1 ,and C1 as the center of each cluster, respectively. Use the k-means algorithm to show
1) The three cluster centers after the first round execution
2) The final three clusters
A 1 (2, 10), A 2 (2, 5), A 3 (8, 4), B 1 (5, 8), B 2 (7, 5), B 3 (6, 4), C 1 (1, 2), C 2 (4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1 , B1 ,and C1 as the center of each cluster, respectively. Use the k-means algorithm to show
1) The three cluster centers after the first round execution
2) The final three clusters
7 M
3 (d)
Explain linear regression? What are the reasons for not using the linear regression
model to estimate the output data?
7 M
4 (a)
What is decision tree induction? Write Basic algorithm for inducing a decision tree from training tuples.
7 M
4 (b) (i)
List strengths and weakness of neural network as classifier.
4 M
4 (b) (ii)
How can distance be computed for attributes that having missing valves in K-Nearest Neighbour classifier?
3 M
4 (c)
A database has 5 transactions. Let min_sup = 60% and min_conf = 80%.
1) Find all frequent itemsets using Apriori algorithm
2) List all the association rules (with support s and confidence c) matching the following meta rule, where X is a variable representing customers, and item denotes variables representing items (e.g., "A", "B", etc.):
∀xϵ transaction; buys (X, item1)Λbuys(X,tem2)→busy(X,item3)[s,c].
TID | items_bought |
T100 | {M,O,N,KE,Y} |
T200 | {D,O,N,K,E,Y} |
T300 | {M,A,K,E} |
T400 | {M,U,C,K,Y} |
T500 | {C,O,O,K,I,E} |
1) Find all frequent itemsets using Apriori algorithm
2) List all the association rules (with support s and confidence c) matching the following meta rule, where X is a variable representing customers, and item denotes variables representing items (e.g., "A", "B", etc.):
∀xϵ transaction; buys (X, item1)Λbuys(X,tem2)→busy(X,item3)[s,c].
7 M
4 (d)
What are the methods to evaluate accuracy of classifier/predictor?
7 M
5 (a)
Write a short note on web usage mining.
7 M
5 (b)
Discuss basic principle of Attribute Oriented Indication.
7 M
5 (c)
What is time series database? How to characterize the time series data using trend analysis?
7 M
5 (d) (i)
What are measures for assessing quality of text retrieval mining system?
3 M
5 (d) (ii)
What are the terminating conditions to stop training process of neural network classifier?
4 M
More question papers from Data Warehousing And Data Mining