We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A systematic construction of non-i.i.d. data sets from a single data set: non-identically distributed data.
- Authors
Torra, Vicenç
- Abstract
Data-driven models strongly depend on data. Nevertheless, for research and academic purposes, public data sets are usually considered and analyzed. For example, most machine learning algorithms are applied and tested using the UCI Machine Learning repository. There is a current need for not i.i.d. data sets for distributed machine learning. Recall that i.i.d. random variables stand for independent and identically distributed (i.i.d.) random variables. An example of this need is federated learning. In federated learning, the typical scenario is to consider a set of agents each one with its own data set. Agents are typically heterogeneous and because of that, it is not appropriate to consider that the data of these agents follow the same distributions. In this paper we propose an approach to build non-identically distributed data sets from a single data set for machine learning classification, where we may suppose or not that all instances follow the same distribution. Each device will have only instances of a subset of the classes. The approach uses optimization to distribute the data set into a set of subsets, each one following a different distribution. Our goal is to define an approach for building subsets for training that is as systematic as the approaches used for cross-validation/k-fold validation.
- Subjects
RANDOM variables; INDEPENDENT variables
- Publication
Knowledge & Information Systems, 2023, Vol 65, Issue 3, p991
- ISSN
0219-1377
- Publication type
Article
- DOI
10.1007/s10115-022-01785-3