Torra, Vicenç

doi:10.1007/s10115-022-01785-3

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: A systematic construction of non-i.i.d. data sets from a single data set: non-identically distributed data.
Authors: Torra, Vicenç
Abstract: Data-driven models strongly depend on data. Nevertheless, for research and academic purposes, public data sets are usually considered and analyzed. For example, most machine learning algorithms are applied and tested using the UCI Machine Learning repository. There is a current need for not i.i.d. data sets for distributed machine learning. Recall that i.i.d. random variables stand for independent and identically distributed (i.i.d.) random variables. An example of this need is federated learning. In federated learning, the typical scenario is to consider a set of agents each one with its own data set. Agents are typically heterogeneous and because of that, it is not appropriate to consider that the data of these agents follow the same distributions. In this paper we propose an approach to build non-identically distributed data sets from a single data set for machine learning classification, where we may suppose or not that all instances follow the same distribution. Each device will have only instances of a subset of the classes. The approach uses optimization to distribute the data set into a set of subsets, each one following a different distribution. Our goal is to define an approach for building subsets for training that is as systematic as the approaches used for cross-validation/k-fold validation.
Subjects: RANDOM variables; INDEPENDENT variables
Publication: Knowledge & Information Systems, 2023, Vol 65, Issue 3, p991
ISSN: 0219-1377
Publication type: Article
DOI: 10.1007/s10115-022-01785-3

We found a match

A systematic construction of non-i.i.d. data sets from a single data set: non-identically distributed data.

Torra, Vicenç

RANDOM variables; INDEPENDENT variables

Knowledge & Information Systems, 2023, Vol 65, Issue 3, p991

0219-1377

Article

10.1007/s10115-022-01785-3