Human-knowledge-augmented Gaussian process regression for state-of-health prediction of lithium-ion batteries with charging curves

Lithium-ion batteries have been widely used in renewable energy storage and electric vehicles, and State-of-Health (SoH) prediction is critical for battery safety and reliability. Following the standard SoH prediction routine based on charging curves, a human-knowledge-augmented Gaussian process regression (HAGPR) model is proposed by incorporating two promising artificial intelligence techniques, i


Introduction
The International Energy Agency (IEA) predicted that, in ten years' time, the share of renewables in global electricity supply will be 60% [1] and the global electrified vehicle stock will reach 245 million [2].Lithium-ion batteries are core components that have been widely adopted in renewable energy storage and electrified transport systems, e.g., wearable equipment [3], [4], smart grids [5], [6], energy harvesting systems [7], [8], hybrid vehicles [9], [10], and electric vehicles [11], [12].Estimation and prediction of battery states, e.g., State-of-charge (SoC) and state-of-health (SoH), are critical for design of battery management systems (BMS) [13], [14], battery thermal management systems (BTMS) [15], [16], and battery-integrated systems (e.g., hybrid vehicles [17], [18]) to allow safe and reliable operation of the lithium-ion batteries.It is also important for the control functionalities in energy systems, e.g., charge sustaining control [19], [20] and energy management system [21], [22] of electrified vehicles.Estimation and prediction of SoH is very important for battery aging judgment and can provide guidance for reasonable battery use [23], [24].Compared to SoC, it is more challenge for estimation and prediction of SoH, where the former calculates the SoH at the current state and the later predicts the SoH in the future.
To enable accurate SoH estimation and prediction, analytic models and data-driven models have been developed as the two mainstreams in recent years.Analytic models, including physically-based electrochemical models, semiempirical models, and empirical models are developed based on the physical and chemical mechanism of battery degradation [25].Mevawalla et al. proposed a mathematical model based on experimental measurement of the tab/surface temperature, separator, electrolyte resistance, anode-cathode irreversible and reversible heat [26].The analytic models have contributed developed understanding of battery degradation mechanisms, however, there are momentous challenges for these analytic models in simulation of all underlying dynamics in battery degradation with fast computing speed.
With rapid development in artificial intelligence (AI) and informatics technology (IT), it becomes feasible to estimate and predict SoH with fast simulation speed by using data-driven models which were mainly developed using parametric and non-parametric methods.The parametric methods are used for building support vector machine (SVM), relevance vector machine (RVM), and artificial neural networks (ANN).Nuhic et al. implemented a SVM for SoH estimation [27].Hu et al. enabled online SoH estimation based on sparse Bayesian learning where a RVM is employed for probabilistic kernel regression [28].Garg et al. proposed a genetic programming method for modelling of battery aging with multidisciplinary parameters [29].Panchal implemented an ANN model for prediction of battery performance at different discharging rates and boundary temperature conditions [30].
With significant superior representation capability and prediction accuracy than parametric methods (e.g., SVM and ANN), Gaussian process regression (GPR) has been recognised as a promising machine learning technique in recent years [31].As a well-known non-parametric method, GPR treats the input-to-output mapping as a random function with a probability density defined using a Gaussian process prior.Liu et al. used gaussian process regression to enable prognostics of battery SoH [32].Li et al. implemented a mixed model of GPR and particle filtering to predict the SoH of a battery under uncertain conditions [33].Richardson et al. proposed a GPR method for forecasting battery's SoH and highlighted various advantages of GPR over other data-driven and mechanistic approaches [34].Yang et al. proposed a GPR model based on charging curves to enable battery SoH estimation [35].Although GPR models can achieve promising results, obtaining robust AI-based SoH prediction models with less data and faster learning process is still a challenge task.
Feature extraction is an important technique for obtaining robust and accurate models.Conventionally, features are extracted by calculating the mean, maximum, and minimum values of the data in time domain or frequency domain.In recent years, feature extraction function has been integrated in the AI models.Tagade et al. developed a deep learning for battery capacity estimation, end of life prediction, and degradation mode diagnosis simultaneously, and a feature vector is introduced for feature extraction and trained as a part of the deep network [36].In Tagade's model, the feature extraction is purely data-driven and in somehow increase the computational effort by introducing more parameters for neural network training.The authors developed a fuzzy feature extraction method for driving behavior identification [37].By defining fuzzy rules and membership functions based on human knowledge, the fuzzy feature extraction method is shown superior to the conventional method with time domain and frequency domain data.Recently, adaptive neurofuzzy inference systems (ANFISs) have elaborated superiorities in machine learning as they can incorporate heuristic human knowledge in data-driven modelling [38].As many theorical and experimental studies have been conducted, the

A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d
extractable features from human knowledge for battery state estimation and prediction can be the shape of charging voltage curves [35], chemical and physical mechanicians in battery degradation [25].However, the research on feature extraction with the ANFIS for SoH prediction is scarcely reported.
To address the challenges as illustrated above, this paper proposed an improved GPR model that can incorporate human knowledge with a deep architecture.Following the standard battery diagnosis routine that is based charging curves [35], the present work is conducted with two original contributions: 1) An adaptive neural fuzzy inference system (ANFIS) is developed for the first time to enable feature extraction based on human's knowledge to reduce the need of physical battery aging testing; 2) A human-knowledge-augmented Gaussian process regression (HAGPR) model is proposed by incorporating the ANFIS with a Gaussian process model to enhance the SoH prediction capability.
The rest of this paper is organized as follows.Section 2 introduces the battery aging test setup, the testing datasets, and the features in the charging curves.The HAGPR network is proposed in Section 3. Section 4 discusses the results of the experimental evaluation on the proposed HAGPR network, and conclusions are summarized in Section 5.

Experiment setup and battery datasets
The cyclic aging datasets of lithium-ion batteries are obtained from an open-source data repository of the NASA Ames Prognostics Centre of Excellence (PoCE) [39].The data was collected from a battery prognostics test bed at room temperature (24 ℃) with four commercially available 18650 battery cells, i.e., No.5, No.6, No.7, and No.18.The nominal voltage and capacity of the 18650 cell are 3.7 V and 2600 mAh, respectively [39].The aging testing was conducted under several charge-discharge cycles.Each cycle started with a discharging process that was followed by a charging process.For the discharging process, the battery was discharged with a constant current (CC) until the battery voltage is less than the lower limit.The discharging current and voltage limit for different batteries are illustrated in Table .1 [40].For the charging process, the battery was firstly charged with a CC of 1.5 A until the battery voltage reached 4.2 V and then charged with a constant voltage (CV) of 4.2 V until the battery current dropped to 30 mA. [39] Charging conditions Voltage upper limit (V)

Table.1. Discharge conditions of the studied batteries
Voltage lower limit (V) Discharge current (A) Battery No.The attenuation curves (SoH vs. number of cycles) of the studied four battery cells are illustrated in Fig. 1.In general, battery SoH decreases during the aging testing because of battery degradation.Because of the regeneration phenome, the battery SoH is not a monotone decreasing form with the progressing of the testing cycles.This small-range capacity rise phenomenon has a significant influence on the accuracy and precision of SoH prediction.Therefore, investigations on the features in charging curves that influence SoH performance is necessary to obtain advanced SoH models.

Representative features in charge curves
By looking into the aging testing data of battery No.6 as an example, the charging curves of cycle 1, 50, 80, and 130 are compared in Fig. 2 (a), which indicate that the CC charging period has a decreasing trend while CV charging

A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d
period has an increasing trend as progressing of the aging testing.The graphical representations of the CC charging curves are also various during the aging testing.According to the geometrical analysis of charging curve in different number of cycles as in [35], four representative features, i.e., F1, F2, F3 and F4, as shown in Fig. 2 (b), are selected for SOH prediction as defined in Table 2. Table .2Definition of the four features in charging curves [35] Feature Description Mathematical model  1 The time of CC charge stage  1 =   −  0 , where  0 is the start time of the CCCV charging cycle; and   is the time when the battery voltage is firstly equal to the upper voltage limit of 4.2 V.  2 The time of CV charge stage  2 =   −   , where   is the time when the battery charging current is less than 30 mA.

𝐹 3
The transient voltage changing rate at the boundary between CC stage and CV stage

∆𝑡
, where ∆ is the sampling time to obtain the charging curve; and  (  −1) is the measured voltage one sample step earlier than   .

𝐹 4
The average voltage changing rate at the CC stage , where  0 is the battery voltage at the beginning of the CCCV cycle

Proposed methodology
Following the standard SoH estimation routine based on charging curves [35], the human-knowledge-augmented Gaussian process regression (HAGPR) method, as illustrated in Fig. 3, is proposed with four main procedures, i.e., 1) preparation, 2) feature extraction, 3) SoH estimation, and 4) experimental evaluation, which are highlighted with blue, orange, green, and yellow, respectively.

A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d
Firstly, the datasets obtained from battery aging testing are pre-processed and portioned into learning set and verification set.Four features, as identified in Section 2.2, are selected as the input of the SoH estimation model.The Min-Max method is used to normalize the feature values between 0 and 1 by where,   is a feature vector of the th feature ( = 1,2,3,4) that contains the feature values calculated with the battery aging test data using the equations in Section 2.2; min(  ) and max(  ) are the minimum value and the maximum value of the feature vector, respectively; and   ′ is the normalized feature vector.
Then, the proposed HAGPR model, as shown in Fig. 4, is developed by incorporating the ANFIS network (for feature extraction) and the GPR model (for SoH estimation).The inputs of the HAGPR model are the normalized four features, and the output is the battery SoH.The detailed procedures for ANFIS-based feature extraction and GPR-based SoH estimation are proposed as follows.

Fig. 4.
Architecture of the HAGPR model

Feature extraction with adaptive neural fuzzy inference
The adaptive neural fuzzy inference (ANFIS) network is adopted for feature extraction, and a Takagi-Sugeno fuzzy model is used to build the ANFIS network, as illustrated in Fig. 5, because it is easy to be implemented with data-driven learning [40].The normalized feature values for each charging cycle are gathered in the input layer as an input vector  = [ 1 ′ ,  2 ′ ,  3 ′ ,  4 ′ ]  , and the extracted feature,  * = , is generated in the output layer.The output,  = Φ(), is calculated in three hidden layers using the input, .Three key components of a fuzzy inference system, i.e., fuzzification module, fuzzy rule base, and output inferencing module, are included in three hidden layers of the ANFIS network.
where,  1,ℎ is the ℎ-th membership function for the first feature;  2, is the -th membership function for the second feature;  3, is the -th membership function for the third feature;  4, is the -th membership function for the fourth feature; and (), =1,2,3, is the -th element of triangle parameter vectors, .
The second hidden layer connects the outputs of the input membership functions based on fuzzy rules.Each fuzzy rule applies the following linguistic logic where (,  , ) is the constant type membership function [37]; where  , is a scaling factor.A vector of weighting values,  = [ where N is the number of data points in the dataset.

A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d
The computational effort in training the GPR model depends on the size of covariance matrix generated by  ̃.The most significant development in the present work is to reduce the covariance matrix size by reducing the number of features from four to one by implementation of the ANFIS network.As a data-driven learning, GPR is expected that the data points with similar feature values naturally have close SoH values.To reduce the negative impact of this similarity, kernel functions are widely adapted.This paper uses a squared exponential kernel function [31] where  = [log   , log   ] is a parameter vector;   is the characteristic length; and   is the signal's standard derivation.

Person's correlation
By implementing five triangular membership functions for each of the input features, the ANFIS network for feature exaction is obtained based on the battery dataset No.6.Generic algorithm was used to determine the optimal parameter vector ℂ * that achieves the minimal root mean squared error (RMSE) between the ANFIS output and the experiment data.Six 3D maps were obtained in Fig. 6.a) to Fig. 6.f), respectively, which illustrate the inputs/output mapping of every two inputs to the exacted feature.The proposed ANFIS-based method is shown effective to obtain an extracted feature that has high correlation to the battery SoH (PCC value up to 0.9911).In most cases, the extracted feature has higher PCC value than the four original features.Conventional method in [35] needs all four features to estimate battery SoH because the PCC value rankings of these features to SoH are different in different datasets (e.g. 4 ′ has the second largest PCC value in dataset 6 but has the minimal PCC value in dataset 7) and some features have very similar PCC values (e.g. 1 ′ and  3 ′ in dataset 5).In the worst case (with dataset 18), the PCC value of the ANFIS extracted feature,  * (6), (0.8881) is less than  1 ′ (0.909), the proposed method is still promising because it can reduce the feature number for SoH estimation from four to one., can be control with more than 95% data located in the 5% error zone.The prediction performance depends highly on the of cycle that are used for learning, and both of the conventional GPR model and the HAGPR model would have better prediction performance (fewer absolute errors with the validation data) if more cycle data is used for model learning.As shown by blue solid lines in Fig. 8, the proposed HAGPR model has better prediction performance than the conventional GPR.This is because the human knowledge integrated in the ANFIS network can augment the information provided by the data, it can at least accurately predict the changing trend of the SoH curves when using little learning data, e.g., in Fig. 7.a).With learning data more than in 75 testing cycles, the HAGPR model can achieve acceptable prediction results, i.e., the absolute prediction error can be control with more than 95% data in the 5% error zone.The regression plots of GPR and HAGPR models for SoH estimation and prediction, which are obtained with different number of cycles for learning, are summarized in Fig. 9.The coefficients of determination, i.e., the R squared values, are compared in Table 3.Both GPR and HAGPR have strong capability of SoH estimation by achieve very high R squired values (more than 0.9) when the inputs/output pairs are determined in the learning space for model training.In terms of SoH prediction, where the inputs/output data has not been used for model training, although both of them performs bad when the number of cycles for learning is 70, HAGPR is shown much better than conventional GPR by achieving higher R squared values when the number of cycles for learning is higher than 75.When using 'more than 95% predicted data is located in the 5% error zone' as the acceptance condition, the HAGPR model needs to conduct 75 cycles of aging testing to meet to acceptance condition while GPR needs 110 cycles.Therefore, the proposed HAGPR can save more than 31.8%aging testing to enable accurate battery SoH estimation compared to the conventional GPR method.In addition, this investigation shows that the HAPGR is capable of SoH prediction with a looking ahead window that has similar size to the number of cycles used for learning.This property of HAGPR can help determine the number of learning cycles for battery aging experiments, and k-fold cross-validation methods [7], [41] can be used to validate the SoH prediction model built with limited data.

Robustness of the SOH prediction for different batteries
To testify the robustness of the proposed HAGPR model in SoH prediction for different batteries, the estimated SoH, the predicted SoH, and the absolute errors are obtained with the datasets No.5, No.7, and No.8.To make the comparison (with the conventional GPR model) fair, both of the HAGPR model and GPR model are trained with the same data of 115 testing cycles.As illustrated in Fig. 9, the proposed HAGPR can robustness outperformed the conventional GPR model by obtaining smoother and more accurate SoH predictions.

Conclusions
This paper proposed a human-knowledge-augmented Gaussian process regression (HAGPR) model, which adopts a ANFIS network to involve human knowledge on battery degradation to enable feature extraction for battery SoH prediction with Gaussian process regression.Based on the aging testing data of four selected Lithium-ion batteries, a comparison study is conducted to demonstrate the advantage of the proposed HAGPR model over a conventional GPR model in battery SoH prediction.The conclusions drawn from this research can be summarized as follows:

A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d
1) By introducing the ANFIS network based on the Takagi-Sugeno model, the HAGPR is capable of modelling of the Gaussian process in battery aging with the extracted feature.Pearson correlation tests have suggested that the feature extracted by the ANFIS network has improved correlations with SoH (the PCC value can be up to 0.9911).
2) The proposed HAGPR has the capability to enable accurate SoH prediction with less learning data than the conventional GPR model.According to the performance evaluation on the models developed with different number of cycles for learning, HAPGR can save more than 31.8%aging testing compared to the GPR model.
3) The proposed HAGPR model can robustly outperform the conventional GPR model for SoH prediction.At least 25% MSE can be reduced for SoH prediction of the selected Lithium-ion batteries.
In the planned future work, the HAGPR model will be extended for online SoH prediction with real-time battery data in vehicle daily use.The upgraded SoH model can contribute to multiple objective optimization and real-time energy management control of electrified vehicles.

Fig. 5 .A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d 𝑀 1
Fig. 5. Layout of the ANFIS network for feature extraction The first hidden layer fuzzifies the inputs with triangular membership functions,  1,ℎ ,  2, ,  3, , and  4, , by A c c e p t e d M a n u s c r i p t N o t C o p y e d i t e d

Fig. 6 .
Fig. 6.Inputs/output mapping of the ANFIS: a) F1 and F2 to F*; b) F1 and F3 to F*; c) F1 and F4 to F*; d) F2 and F3 to F*; e) F2 and F4 to F*; f) F3 and F4 to F* To study the feasibility of the proposed ANFIS-based feature extraction method, the ANFIS network obtained with dataset No.6 is used to extract features in datasets No.5, No.7, and No.18.Persons correlation coefficients (PCC) of the features ( 1 ′ ,  2 ′ ,  3 ′ ,  4 ′ , and * ) and target (battery SoH) are obtained in the heatmaps as in Fig.6, where the PCC obtained with datasets No.6, No.5, No.7, and No.18 are illustrated in Fig.6.a), to Fig.6.d),respectively; and  * (6) is the extracted feature based on the ANFIS model that is trained with the data set of battery No.6.The value (between 0 to 1) in the heatmap is the PCC value of its row labelled item and column labelled item, for example, the value of -0.321 in the second row and first column of Fig.5.a) is the PCC value of  1 ′ and  2 ′ .A PCC value closer to 1 means the two items linked to this CC value have stronger correlation.The proposed ANFIS-based method is shown effective to obtain an extracted feature that has high correlation to the battery SoH (PCC value up to 0.9911).In most cases, the extracted feature has higher PCC value than the four original features.Conventional method in[35] needs all four features to estimate battery SoH because the PCC value rankings of these features to SoH are different in different datasets (e.g. 4′ has the second largest PCC value in dataset 6 but has the minimal PCC value in dataset 7) and some features have very similar PCC values (e.g. 1′ and  3 ′ in dataset 5).In the worst case (with dataset 18), the PCC value of the ANFIS extracted feature,  * (6), (0.8881) is less than  1 ′ (0.909), the proposed method is still promising because it can reduce the feature number for SoH estimation from four to one.

Fig. 10 .
Fig. 10.Estimation and prediction performance with different batteries: a) No.5; b) No.7; and c) No.18.By calculating mean squared errors (MSE) of the estimation results, MSE est = (SoH est − SoH real ) 2   ⁄ , and the prediction results, MSE pre = (SoH pre − SoH real ) 2   ⁄ , where   and   are the number of sample points for estimation and prediction, the SoH estimation and prediction performances for different batteries are quantified in Table.4.It indicated that the proposed HAGPR is shown more robust by achieving similar MSE values in SoH estimation and prediction.Although GPR can achieve lower MSE values in SoH estimation, the HAGPR model can robustly outperform the conventional GPR model for SoH prediction by reducing at least 25% MSE.