Sedimentary Environment Analysis by Grain-Size Data Based on Mini Batch K-Means Algorithm
.A development team with both strength and technology.
项目简介：Data mining, knowledge discovery, and machine learning algorithms have virtually permeated into research in various fields [1–4]. The complex network as a significant method of data mining gives top priority to discovering concealed information between things. Therefore, a great number of researchers from various research fields, including mathematics, physics, biology, chemistry, and oceanology, used the complex network to explore the potential relationships between data [5–9]. The complex network has some characteristics: self-similarity, self-organization, scale-free, smallworld, community structure (cluster), and node centrality. The community structure is one of the most important traits because it can objectively reflect the potential relationships between nodes. A community is made of one group of nodes within which the links between nodes are densely connected but between which they are sparsely connected with other clusters [10, 11]. The grain-size analysis is one of the basic tools for classifying sedimentary environments, an analysis which can provide important clues to the provenance, transport history, and depositional conditions . In general, the representative statistical parameters of grain-size analysis involve median, mode, mean, separation parameter, skewness, and kurtosis . During the last few decades, two computing methods of grain-size parameters were developed: the graphical method and the moment method . Blott and Pye (2011) presented that these two analysis methods had some advantages and disadvantages in computing sediment grain-size samples with various parameters. As most sediments are polymodal, curve shape and statistical measures Hindawi Geofluids Volume 2018, Article ID 8519695, 11 pages https://doi.org/10.1155/2018/8519695usually simply reflect the relative magnitude and separation of populations. Polymodal grain-size spectrum can be considered as a result of the superposition of several unimodal components . Many works have shown that different grain-size distribution is related to special transport and deposition process . Three kinds of functions are commonly used to fit the grain-size distribution: Normal function, Lognormal function, and Weibull function . Base on experimental results, Sun et al.  found that the Weibull function was appropriate for the mathematical description of the grain-size distribution of all kinds of sediments while the application of Normal function for fluvial and lacustrine sediments was also acceptable. Although these methods, especially Weibull function, performed well in sediment in fitting grain-size distribution, they often need subjective experience of the researchers, and the definite criteria for environmental determination have not been given. Based on the data of borehole Lz908, Yi et al. analyzed the evolution of the sedimentary environment. Besides grainsize data, they also used the data of magnetic susceptibility, tree pollen, radiocarbon dating, and optically stimulated luminescence (OSL) dating [16, 17]. Can the same conclusion be obtained by using only the grain-size data which are the relatively convenient and low-priced indices? In this paper, we introduce complex network into the data modeling of sediment grain-size data. Based on the theory of bipartite graph , we construct the Sample/ Grain-Size bipartite weighted network model which can objectively reflect the association relationships between sediment samples and grain sizes. By using projection, we will construct the Sample network model from the bipartite network. After repeatedly testing based on tens of representative clustering algorithms, we have selected the Mini Batch K-means algorithm , an optimization algorithm combined with the K-means algorithm , and the classical batch algorithm  to split the Sample nodes into their categories and find the relationships between the sedimentary environment and grain size. After 400 tests, we can find the most appropriate parameters in Mini Batch K-means algorithm. Finally, we will use four evaluation indices AMI, NMI, completeness, and precision to verify the accuracy and efficiency of clustering divisions.