A Technical Approach on Large Data Distributed Over a Network
DOI: https://doi.org/10.12777/ijse.2.2.34-41
Abstract
Data mining is nontrivial extraction of implicit, previously unknown and potential useful information from the data. For a database with number of records and for a set of classes such that each record belongs to one of the given classes, the problem of classification is to decide the class to which the given record belongs. The classification problem is also to generate a model for each class from given data set. We are going to make use of supervised classification in which we have training dataset of record, and for each record the class to which it belongs is known. There are many approaches to supervised classification. Decision tree is attractive in data mining environment as they represent rules. Rules can readily expressed in natural languages and they can be even mapped o database access languages. Now a days classification based on decision trees is one of the important problems in data mining which has applications in many areas. Now a days database system have become highly distributed, and we are using many paradigms. we consider the problem of inducing decision trees in a large distributed network of highly distributed databases. The classification based on decision tree can be done on the existence of distributed databases in healthcare and in bioinformatics, human computer interaction and by the view that these databases are soon to contain large amounts of data, characterized by its high dimensionality. Current decision tree algorithms would require high communication bandwidth, memory, and they are less efficient and scalability reduces when executed on such large volume of data. So there are some approaches being developed to improve the scalability and even approaches to analyse the data distributed over a network.[keywords: Data mining, Decision tree, decision tree induction, distributed data, classification]