PHYSICAL MODEL-GUIDED MACHINE LEARNING FRAMEWORK FOR ENERGY MANAGEMENT OF VEHICLES
20200108732 ยท 2020-04-09
Inventors
- William Northrop (Minneapolis, MN, US)
- Andrew Kotz (Minneapolis, MN, US)
- PengYue Wang (Minneapolis, MN, US)
Cpc classification
G06N7/01
PHYSICS
Y02T10/70
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06N3/006
PHYSICS
B60L58/12
PERFORMING OPERATIONS; TRANSPORTING
International classification
B60L58/12
PERFORMING OPERATIONS; TRANSPORTING
B60L50/60
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A method of determining when to increase an amount electrical energy available to a vehicle includes setting a parameter for a function describing a reference state of charge as a function of distance traveled, wherein the reference state of charge represents a state of charge of the vehicle at which the amount of electrical energy available to the vehicle should be increased. For each trip of the vehicle, the parameter for the function is modified so that different trips of a same vehicle use different functions for the reference state of charge.
Claims
1. A method of determining when to increase an amount electrical energy available to a vehicle, the method comprising: setting at least one parameter for a function describing a reference state of charge as a function of distance traveled, wherein the reference state of charge represents a state of charge of the vehicle at which the amount of electrical energy available to the vehicle should be increased; for each trip of the vehicle, modifying at least one parameter for the function so that different trips of a same vehicle use different functions for the reference state of charge.
2. The method of claim 1 wherein modifying at least one parameter comprises modifying at least one parameter before a trip to form at least one modified parameter and using the at least one modified parameter for an entirety of the trip.
3. The method of claim 2 wherein modifying at least one parameter before a trip comprises modifying at least one parameter based on a prior model created at least in part from past trips of the vehicle.
4. The method of claim 3 wherein the prior model is created at least in part from a respective best value for each of the at least one parameters determined for a last trip of the vehicle, wherein the respective best value for each of the at least one parameters results in the least amount of electrical energy being made available to the vehicle while preventing the state of charge from crossing below a threshold during the last trip.
5. The method of claim 4 wherein the best value is determined using a vehicle model to estimate changes in the state of charge for the last trip.
6. The method of claim 1 wherein modifying at least one parameter comprises modifying at least one parameter during the trip.
7. The method of claim 6 wherein modifying at least one parameter comprises using a neural network to select a change in at least one parameter based on a state of the vehicle.
8. The method of claim 7 wherein the neural network is trained using reinforcement learning.
9. The method of claim 1 wherein the vehicle is a range extended hybrid electric vehicle.
10. The method of claim 9 wherein the vehicle is an all-electric vehicle.
11. A computer system comprising: a communication interface receiving trip information from a vehicle for a time period; a processor, receiving the trip information from the communication interface and performing steps comprising: using the trip information for the time period to change how a reference state of charge is determined, wherein the reference state of charge represents a state of charge of the vehicle at which an amount of electrical energy available to the vehicle should be increased.
12. The computer system of claim 11 wherein the time period covers the entirety of a previous trip.
13. The computer system of claim 12 changing how the reference state of charge is determined comprises changing how the reference state of charge is determined based on a prior model of a parameter used to determine the reference state of charge.
14. The computer system of claim 13 the prior model is created at least in part from a best value for the parameter determined for a last trip of the vehicle, wherein the best value results in the least amount of electrical energy being made available to the vehicle while preventing the state of charge from crossing below a threshold during the last trip.
15. The computer system of claim 14 wherein the best value is determined using a vehicle model to estimate changes in the state of charge for the last trip.
16. The computer system of claim 11 wherein the time period covers less than all of a trip in progress.
17. The computer system of claim 16 wherein using the trip information for the time period to change how the reference state of charge is determined comprises using the trip information to identify a state and applying the state to a neural network to obtain the change in how the reference state of charge is determined.
18. A computing device comprising: a memory storing trip information for a vehicle; a processor executing instructions to perform steps comprising: using at least some of the trip information to alter a function used to determine a reference state of charge, wherein the reference state of charge represents a state of charge of the vehicle at which an amount of electrical energy available to the vehicle should be increased and wherein the reference state of charge changes during a vehicle trip; and using the altered function to determine the reference state of charge.
19. The computing device of claim 18 wherein the trip information used to alter the function comprises trip information for an entirety of a latest trip.
20. The computing device of claim 19 wherein the trip information is used to alter a prior probability distribution and the prior probability distribution is used to alter the function.
21. The computing device of claim 18 wherein the trip information comprises trip information for a current trip.
22. The computing device of claim 21 wherein altering the function comprises determining a state from the trip information and applying the state to a neural network to determine how to alter the function.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
I. Introduction
[0024] Ideally, when driving an all-electric or range extended hybrid electric vehicle, the vehicle will complete the current trip without requiring recharging of the battery. Recharging an all-electric vehicle involves finding a charging station, plugging in the vehicle and waiting until a sufficient charge has been provided to the battery. This lengthens the time required to complete the current trip. Recharging a range extended hybrid electric vehicle involves running an internal combustion engine, which increases fuel costs and vehicle emissions. Thus, in both cases, it is best to limit the amount of battery recharging that takes place during a trip.
[0025] A significant challenge to limiting the amount of battery recharging that takes place during a trip is that it is extremely difficult determine what battery state of charge should trigger recharging. In particular, as the end of the trip approaches, less charge is required to reach the end of the trip. Thus, the charge level that triggers recharging should decrease over the course of the trip. In the discussion below, the charge level that triggers recharging is referred to as the reference state of charge. In other words, the reference state of charge represents a state of charge that triggers an instruction to increase the amount of electrical energy available to the vehicle either by plugging the vehicle in or running an internal combustion engine.
[0026] In the embodiments discussed below, the reference state of charge is determined as a function of the distance traveled during the current trip. The embodiments provide techniques for modifying this function so that the function minimizes the amount of additional electrical energy that must be made available during a trip while ensuring that the charge of the battery does not fall below a threshold value at any point during the trip.
[0027]
[0028]
[0029] To maximize the energy use from the battery without running out of battery, the reference state of charge (SOC.sub.ref) is designed to reach a target SOC value at the end of the trip, denoted as SOC.sub.tev. The SOC.sub.ref is defined as:
SOC.sub.ref=min((1f(d).sub.)*100%,60%) (1)
[0030] where d is the distance a vehicle has traveled so far on a given trip and represents one or more parameters of the function f(d) and the reference state of charge is set to a maximum of 60% to prevent recharging of the battery when the state of charge is greater than 60% thereby reducing fuel consumption and preventing charging the battery too many times, which will degrade the battery's life. In the reference state of charge equation, f(d).sub. has a value between zero and
For example, for an SOC.sub.tev of 10%, f(d).sub. has a value between zero and 0.9. In some embodiments, f(d).sub. is a linear function such as:
[0031] where 0.9 and L.sub.set are the parameters of f(d).sub.. Ideally, if L.sub.set matches the actual total route distance, the vehicle finishes the trip with SOC.sub.tep and is charged at the depot using the electricity from the grid at night, minimizing fuel consumption. However, L.sub.set is difficult to determine because it is difficult to estimate the trip distance accurately a priori. Vehicles in different delivery areas have very different distributions of trip distances day-to-day. Also, for an individual vehicle, the trip distances in actual routes vary from the scheduled distance and differ day-to-day based on delivery demand, driver behavior, vehicle weight difference, weather and traffic, for example, even though the vehicles might traverse the same region each day.
[0032] By changing the parameters of f(d).sub., different functions for SOC.sub.ref can be created. For example,
[0033] In the embodiments described below, a vehicle model is used to calculate a best SOC.sub.ref function for each trip of the vehicle. Table 1 provides the parameters necessary for the vehicle model.
TABLE-US-00001 TABLE I VEHICLE MODEL PARAMETERS Symbol Parameter c.sub.rr Coefficient of rolling resistance C.sub.d Coefficient of air resistance P.sub.b Battery power P.sub.e Engine/Charger power P.sub.btw Battery to wheel power P.sub.etw Engine/Charger to wheel power .sub.btw Efficiency from battery to wheel .sub.etw Efficiency from engine/charger to wheel Air density A Frontal area m Total mass V.sub.oc Open circuit voltage R.sub.0 Battery internal resistance Q Battery capacity f Cumulated fuel use
[0034] The vehicle force demand can be written in the following form:
F.sub.demand=F.sub.acceleration+F.sub.roll+F.sub.air+F.sub.g(3)
Where
[0035]
F.sub.acceleration=ma
F.sub.roll=c.sub.rrmg cos()
F.sub.air=c.sub.dAv.sup.2
F.sub.g=mg sin() (4)
[0036] Neglecting the road grade and the power estimated as P=Fv gives:
P.sub.demand=mav+c.sub.rrmgv+c.sub.dAv.sup.3(5)
[0037] The power in the case of an ReHEV and an all-electric vehicle is provided solely by the electric motor, which uses energy from the battery, P.sub.btw, and an additional source of electrical energy (the combustion engine for range-extended hybrid electric vehicles and an external charger for all-electric vehicles), P.sub.etw, do that:
P.sub.demand=P.sub.btw+P.sub.etw (6)
where
P.sub.btw=(P.sub.bP.sub.accessary).sub.btw
P.sub.etw=P.sub.e.sub.etw (7)
[0038] Neglecting the power consumption of accessories, the power of battery is:
[0039] By assuming .sub.btw, .sub.etw, m, g, c.sub.rr, A, c.sub.w, are all constants and the fact P.sub.e is a constant (neglecting the transition process from on to off and off to on), we can rewrite this equation with the dependence on time t as:
P.sub.b(t)=Av(t)+Bv.sup.3(t)+Ca(t)v(t)D (9)
Where:
[0040]
Battery Model
[0041] A simplified battery model is used to model the battery pack:
P.sub.b(t)=V.sub.oc(s)I(t)R.sub.0(s)I.sup.2(t) (11)
[0042] V.sub.oc (s) and R.sub.0(s) depends on SOC (s). The derivative of s is proportional to current at the battery terminals:
[0043] Solving the current from the battery power equation and substituting the current into the above equation:
[0044] V.sub.oc(s) can be modeled as a piecewise linear function of the SOC and R.sub.0(s) can be modeled as a constant R.sub.0(s)=R.sub.0. By combining Equations 9 and 13, we can use velocity profile as input to calculate the SOC profile step by step given the initial SOC:
s(t+t)=s(t)+{dot over (s)}(t)t (14)
[0045] If the vehicle is stopped and the engine is on or the charger is plugged in, the SOC update is simply:
s(tt)=s(t)+C.sub.charging ratet(15)
[0046] In our case, t=1, which means the step size is 1 second.
Engine Model
[0047] In accordance with one embodiment, the engine is modeled as working in a fixed condition where the fuel rate and engine charging power are both constant. The transition process from off to on and on to off is neglected. So, when the engine is turned on:
f(t+t)=f(t)+C.sub.fuel ratet(16)
[0048] To use this vehicle model, data is collected from the vehicle using on-board diagnostics measurements. Measured parameters include the status of the power system (e.g., SOC, additional electrical energy made available to the battery), the vehicle's movement (e.g., odometer, speed), and others (e.g., fuel consumption, emissions). In accordance with one embodiment, 355 parameters per vehicle in total are recorded with the timestamp and the vehicle's location every five seconds when the vehicle is running.
[0049] Data from the vehicles were stored in a secure spatial database instance with support for geometry objects and spatial indexes. The database schema consists of three main tables:
[0050] Vehicle, TripSummary, and EGENDriveTrip.
[0051] The Vehicle table records properties of each vehicle, such as the make, model, and year. Every record in TripSummary is a summary of a single delivery trip of a vehicle. Each summary contains attributes such as the starting date and time, duration, and distance. In addition, each summary is associated with a trip trajectory, which is composed of a series of spatial points stored in the EGENDriveTrip table. Each record in EGENDriveTrip describes the spatial location of a vehicle at a specific timestamp. It also contains the on-board diagnostics measurements of the vehicle at that time. To ensure data security, a virtual machine is employed to process and import data.
Data Preprocessing
[0052] Data quality is crucial to the accuracy of the vehicle simulation. However, raw data from on-board diagnostics frequently have errors. Three common problems include low resolution, missing values, and wrong values. In the case of the dataset used here, low update rate results in a stepped profile shape for vehicle distance, requiring interpolation. Velocity occasionally remains at 0 even when the corresponding distance increases. The low-resolution problem in velocity profile can degrade the accuracy level of the model that solves the SOC step by step. The missing value problem can introduce high error. For example, if several zero-velocity data points are missing between two non-zero velocity data points, the model will connect the two non-zero velocity linearly and consider the vehicle does not stop. To solve these problems in the raw data, a data preprocessing procedure is used to preprocess the data iteratively. Interpolation and Gaussian filters are first used to correct low-resolution and missing values. These methods use the information in distance profile to correct the wrong velocity value problem iteratively.
[0053] The trip-level data preprocessing procedure is:
Step 1
[0054] Zero-filling and forward-filling for the velocity profile and distance profile respectively to fill in the missing values;
Step 2
[0055] For both profiles, interpolate the 5 second data into 1 second data linearly;
Step 3
[0056] Use Gaussian filter to process the distance profile and velocity profile to get smoothed distance profile and velocity profile, the degree of smoothness is determined by .sub.1 and .sub.2;
Step 4
[0057] Calculate a new velocity profile from smoothed distance profile by second order finite difference method (for the first and last data point, velocity is zero):
Step 5
[0058] Compare every point of the smoothed velocity profile and the corresponding point in the new velocity profile calculated from the smoothed distance profile and update all points that the value is 0 in the smoothed velocity profile and the value is not 0 at new velocity profile into the non-zero value multiplies by a factor E;
Step 6
[0059] Calculate new distance profile by the smoothed and corrected velocity profile and if the final distance calculated has an error smaller than 500 m, the preprocessing is finished. Otherwise, go back to step 5 and update E according to the value of error until the stopping criteria is satisfied.
[0060] As the actual velocity profile should be continuous and the velocity and acceleration cannot be too large, a Gaussian filter is used to infer the distance and velocity information between the 5 second data. Also, the smoothing process improves the data quality of the distance profile largely so that a new velocity profile can be found in step 4. Without the Gaussian filter, the velocity calculated from distance profile will yield unrealistic high velocities at the points where distance changes. Also, at some data points, the acceleration calculated from the unsmoothed velocity profile will be too high. Step 5 corrects for wrong velocity values. However, the velocity value calculated from the smoothed distance profile is not accurate requiring a factor to scale the velocity. This procedure refines the data on a trip level.
[0061] Below, two Energy Management Strategy (EMS) embodiments are discussed for optimizing the SOC.sub.ref function. The first is referred to as a Bayesian Algorithm, which defines a prior probability distribution for a parameter of the SOC.sub.ref function and then uses that prior probability distribution to update the parameter, and thus change the SOC.sub.ref function, after each trip of the vehicle. The second embodiment is referred to as a Reinforcement Learning Algorithm, which uses a neural network to select a change to a parameter of the SOC.sub.ref function, and thus change the SOC.sub.ref function, while a trip is in progress.
Bayesian Algorithm
[0062] A naive approach for programming individual vehicles would be to determine the parameters for a future trip by using a determined best determined over all historical trips. However, it is not straightforward to determine how to estimate the proper parameters as there is uncertainty about future trips. Further, if a vehicle has made only a few trips, the statistical strength of such a prediction will be low. To address this uncertainty, the present embodiments model the distribution of best of each vehicle using a Gaussian distribution. For new vehicles or for vehicles driving new route profiles, the number of trips is very small or zero so that it is nearly impossible to have a good estimation of the distribution. To deal with this problem, the parameters of the Gaussian distribution (mean and precision) are estimated using a Bayesian algorithm. The distribution parameters are determined by both data and prior knowledge. Every time new trip data is available, distribution parameters are updated adaptively. Once the distribution parameters are updated, the parameters of the SOC.sub.ref function are calculated conservatively by the cumulative density function (CDF) of the posterior predictive model.
[0063] The actual best parameters is assumed to follow a Gaussian distribution with unknown mean and unknown precision:
p()N(,)(18)
where is the unknown mean and is the unknown precision which is defined as:
the reciprocal of variance.
[0064] To simplify the notation, .sup.[N] and {circumflex over ()}.sup.[N] represents actual best and predicted for the Nth trip.
[0065] Given historical data from N trips, the likelihood can be written in the form:
[0066] If we calculate {circumflex over ()} using the distribution estimated only on the historical data by maximizing the likelihood in Equation 19, when the size of data is small or no data, the {circumflex over ()} calculated will be highly unstable, leading to undesirable performance of the vehicle in reality. To solve this problem, a prior probability distribution is introduced to make the model more conservative.
[0067] Assuming the posterior is proportional to the product of the prior and the likelihood, the form of posterior is given as:
p(,|.sup.[1],.sup.[2] . . . .sup.[N])p(,).Math.p(.sup.[1],.sup.[2] . . . .sup.[N]|,) (20)
[0068] By introducing a prior distribution, and is estimated based on both the information from data and our prior knowledge. This can give us a more conservative estimation for small N. The concept of conjugate prior from Bayesian probability theory is used, which considerably simplifies the analysis. If a prior distribution is conjugate to the likelihood function of a given distribution, the posterior distribution will have the same form of distribution as the prior. The conjugate prior for a Gaussian distribution with unknown mean and unknown precision is the Normal-Gamma distribution: p(,)NormalGamma(.sub.0, k.sub.0, a.sub.0, b.sub.0).
[0069] So, the posterior distribution is also Normal-Gamma:
[0070] To make a prediction of CO for the next trip, we integrate over the posterior:
[0071] Given the prior and historical data, the posterior predictive model for the next {circumflex over ()} is a t-distribution. Robustness is one of the main characteristics of t-distribution. It has longer tails than Gaussian distribution, which means the position and shape of the t-distribution is less sensitive to outliers, which is advantageous in this application. There are four parameters in the t-distribution: .sub.N, .sub.N, a.sub.N, b.sub.N. These are determined by historical data and the parameters in the prior distribution: .sub.0, .sub.0, a.sub.0, b.sub.0. It is important to understand the meaning of these parameters to design a good prior for this approach. To make this more straightforward, consider a new group of parameters: .sub.0, n.sub..sub.
[0072] To determine a prior, the only four parameters that need to be specified are .sub.0, n.sub..sub.
[0073] The procedure for calculating {circumflex over ()} and updating parameters is described as follows:
1) Initialization Step.
[0074] The initial t-distribution is
which is completely determined by the prior when no trip information is available (N=0), leading to:
2) Prediction Step
[0075] The prediction step is based on the CDF of the t-distribution. The value of the CDF evaluated at {circumflex over ()}, is the probability that the next actual best L.sub.set will take a value less than or equal to our predicted L.sub.set:
CDF.sub.({circumflex over ()})=P({circumflex over ()}) (25)
[0076] We determine the {circumflex over ()}.sup.[N+1] by setting the CDF=0.99, which means .sup.[N+1] will be smaller than our calculated value with a probability of 0.99 under our assumption. From this point, it can be seen that, the calculated {circumflex over ()} will be higher than the actual ideal by a margin in most trips. For real-world driving, low leading to a very low SOC during a trip should be avoided to a high confidence level even at the expense of smaller improvement in fuel economy.
3) Update Step
[0077] After a new trip is observed, the parameters in the prior are updated by the parameters in the posterior; i.e., after new data is recorded. The previous posterior information becomes prior for the new information:
.sub.0.sup.new=.sub.N.sup.old
K.sub.0.sup.new=K.sub.N.sup.old
a.sub.0.sup.new=a.sub.N.sup.old
b.sub.0.sup.new=b.sub.N.sup.old (26)
[0078] The parameters in the posterior are then updated using Equation 21 with N=1, m= and s.sup.2=0 according to:
[0079] Where m= is the best SOC.sub.ref function parameter for the latest trip. To find this best , a simplified vehicle model (described below) is run iteratively over the preprocessed velocity profile from the latest trip using different values for to control when the model vehicle is provided with additional electrical energy. The simplified vehicle model predicts the amount of charge that the vehicle will use during different parts of a velocity profile and predicts the rate at which the vehicle will receive charge when additional electrical energy is provided to the vehicle. For each tested value of , the minimal SOC of the trip and the amount of additional electrical energy that was provided to the vehicle are recorded. The value of that requires the least amount of additional electrical energy while ensuring that the state of charge remains above some threshold (such as 10%) is then selected as the best for the latest trip and is used to update the prior probability distribution parameters as shown in Equation 27.
[0080] After updating the parameters in the t-distribution. {circumflex over ()} for the next trip can be calculated by the prediction step.
Designing of the Prior
[0081] In this section, a prior is designed for use in the Bayesian algorithm by determining the parameters .sub.0, n.sub..sub.
[0082] For each vehicle, the initialization, prediction and update steps described above are performed using the best for each historic trip of the vehicle and a common set of prior probability parameters that are being tested. This produces a curve of predicted {circumflex over ()} for each vehicle that starts far from the curve of best and descends toward the best as the number of historic trips increases. For example,
TABLE-US-00002 TABLE II BAYESIAN MODEL PARAMETERS Parameter Value .sub.0 74 n.sub..sub.
[0083] The underlying meaning of the parameters is the prior mean is 74, which was estimated from 5 pseudo samples. The prior precision is 0.01, which is estimated on 50 pseudo samples. The reason of a low number of pseudo samples of mean and a relative high number of pseudo samples of precision can be explained as when the number of data is small, the variance of data can be very large which will lead to a very small precision so that we use more pseudo samples of precision to make the model stable.
[0084] The initial posterior predictive distribution 500 of only determined by the prior probability distribution and the final posterior predictive distribution 502 using all trip data of vehicle C are shown in
Simulation and in-Use Data Study
Validation of the Vehicle Model
[0085] The accuracy of the vehicle model is very important to the developed framework. As the engine on and off control logic is based on the SOC value, validation of the model is mainly based on the SOC curve. Starting with the same vehicle model, we calibrated the parameters for different vehicles on each route with several trips. After calibration, the model will perform consistently on the other trips for the same vehicle. The error comes from our simplification of the vehicle model, which neglects wind speed, road grade and the assumes constant vehicle components efficiencies. Also, noisy and low-resolution raw data introduces error even after the preprocessing process. Furthermore, the SOC value in the raw data itself contains some level of error as the SOC value is not measured directly. Some degree of error is inevitable in all vehicle measurement datasets. As an example, raw SOC data 602 and simulated SOC data 600 for one vehicle are shown in
Additional Electrical Energy with Different
[0086] Additional electrical energy provided in a particular trip is a function of . It was observed that a reduction in additional electrical energy is not guaranteed when is lowered. Also, the amount of additional electrical energy will not increase after it is higher than a particular value. For example, graph 700 of
Fuel Efficiency Improvement
[0087] For range extended hybrid electric vehicles, the fuel efficiency improvement achieved using an energy management system of the various embodiments can be quantified by fuel use and the mile per gallon equivalent (MPGe). MPGe is estimated by the equation:
[0088] MPGe improvement has demonstrated on five range extended vehicles on real-world delivery trips, two of which have more than 15 trips.
[0089] All fuel reduction data for the five demonstration range extended hybrid electric vehicles is summarized in table III.
TABLE-US-00003 TABLE III Fuel efficiency improvement Average MPGe Average Fuel Vehicle Improvement Reduction Number Number (%) (%) of Trips E 9.0 11.0 15 F 8.7 13.9 35 G 7.9 11.3 2 H 11.8 16.1 2 I 5.8 9.1 2
Reinforcement Learning Algorithm
[0090] The reinforcement learning algorithm embodiment trains a neural network to change the parameter of the SOC.sub.ref function during a vehicle trip. The embodiment uses an actor-critic based algorithm called a deep deterministic policy gradient (DDPG) to train the neural network.
[0091] In reinforcement learning problems, an agent neural network and environment interact with each other through state (s), reward (r) and action (a). At the beginning of the interaction, the initial state s.sub.0 is provided by the environment. The agent observes s.sub.0 and calculates an action a.sub.0 according to its policy : s.fwdarw.a which is a mapping from state to action. The environment receives the a.sub.0 and outputs the immediate reward r.sub.0 and the next state s.sub.1. The interaction will go on until a terminal state s.sub.T is reached which give rise to a sequence like:
s.sub.0,a.sub.0,r.sub.0,s.sub.1,a.sub.1,r.sub.1,s.sub.2,a.sub.2,r.sub.2 . . . s.sub.T(29)
[0092] The goal of the agent is to maximize its discounted cumulated reward at each time step through a sequential decision-making process. The discounted cumulated reward at time step t is defined as:
G.sub.t=r.sub.t+r.sub.t+1+.sup.2r.sub.t+2+ . . . (30)
[0093] where is the discount factor with a range from 0 to 1. When it equals to 1, the agent is farsighted, considering all future rewards equally. When it sets to be 0, the agent will only consider the immediate reward, being myopic. As the reward in the future is less likely to get compared with the immediate reward, it is often set to be a number slightly smaller than 1, for example, 0.99 is widely used.
[0094] A classical formulation of sequential decision-making problem is the Markov Decision Process (MDP). Under this formulation, the environment responds to the agent by transition probability p(s.sub.t+1, r.sub.t|s.sub.t, a.sub.t) which only considers the current time step t, ignoring all previous history.
[0095] The problem of selecting the parameter of the SOC.sub.ref function is first formulated as an MDP below and then the RL algorithm used to solve the MDP is discussed.
[0096] An MDP can be represented by a tuple (s, a, p, r, ). The agent in the present embodiment is the policy that can update during the trip. The environment P(s.sub.t+1,r.sub.t|s.sub.t, a.sub.t) is approximated by historical delivery trips and the vehicle model. The historical trips have various distance and energy intensity, which can help the agent learn a generalized strategy for different conditions. The vehicle model is used to calculate the SOC change and the amount of additional electrical energy required given velocity profiles. The additional electrical energy required is then used to calculate the reward, and consequently the next state. The state is the real-time information that the agent has access to. It is represented as a vector at each time step:
s.sub.t=[t.sub.travel,d,SOC,f,x,y,](31)
[0097] where t.sub.travel is the travelled time, d is the travelled distance, SOC is the current state of charge, f is the current total additional amount of electrical energy, x and y are the GPS coordinates, and is the current setting. The agent can only observe this information when making a decision.
[0098] The action space is a predefined range:
a.sub.t[A.sub.max,A.sub.max](32)
[0099] In accordance with one embodiment, a.sub.t is the amount by which is changed. In some embodiments, if the magnitude of the change is below a threshold, is not changed during the time step.
[0100] The reward at each time step t is defined as:
r.sub.t=r.sub.ft.sub.f,t+r.sub.SOCt.sub.SOC,t+r.sub.a,t+r.sub.c(33)
[0101] where the first term penalizes providing additional electrical energy and its magnitude is proportional to the the time t.sub.f,t spent providing additional electrical energy with coefficient r.sub.f equal to 0.001. The second term penalizes the condition of SOC lower than 10% and its magnitude is proportional to the amount of time under that condition t.sub.SOC,t with coefficient r.sub.SOC equal to 0.060. To guide the algorithm in finding an efficient policy, r.sub.a,t is added to penalize actions that change . It will cause a reward of 0.020 if the is changed.
[0102] The first term r.sub.ft.sub.f,t penalizes all additions of electrical energy during the trip as the remaining distance and energy intensity is unknown. However, for trips exceeding the vehicle's all-electric range, the additional electrical energy is necessary to keep the SOC larger than 10%. Consequently, to compensate the negative reward caused by the necessary additional electrical energy, a reward term r.sub.c is imposed at the end of the trip. After the trip is finished, the amount of fuel that is necessary to keep the SOC larger than 10% is simulated using the vehicle model, which determines the magnitude of r.sub.c. For example, if after a delivery trip it is calculated that 1 gallon of fuel must be used to keep the SOC larger than 10% during the trip and the actual fuel use is 1.5 gallon, the negative reward caused by the 1 gallon will be compensated by the r.sub.c term.
Training
[0103] The neural network represents a deterministic policy and is parametrized by .sup.. Given a state s.sub.t, the corresponding action is calculated by:
a.sub.t=.sub..sub.
[0104] The performance J of policy .sub..sub.
J(.sup.)=E.sub.p.sub.
where is a trajectory of interaction:
s.sub.0,a.sub.0,s.sub.1,a.sub.1,s.sub.2,a.sub.2 . . . s.sub.T(36)
p.sub..sub.
p.sub..sub.
r() is the total discounted reward for a trajectory :
r()=.sub.t=0.sup.T.sup.tr(s.sub.t,a.sub.t)(38)
[0105] Eq. (34) represents the expected discounted cumulated reward that the policy .sub..sub.
.sup..sup.+.sub..sub.
where is the learning rate.
[0106] The policy gradient .sub..sub.
.sub..sub.
where Q.sub..sub.
Q.sub..sub.
[0107] The action-value function represents the expected discounted cumulated reward G.sub.t that can be received by taking action a at state s and then following policy .sub..sub.
[0108]
[0109] Experience replay and target network with soft update is used to stabilize the training process. Experience replay is used to break the correlations of transition pairs used to update the actor and critic. The target network is used to provide a more stable target for the update. A Gaussian exploration noise is added to the action during training to keep the algorithm exploring the goodness of different actions under different states.
[0110] In accordance with one embodiment, both the critic and actor are feedforward neural networks with two hidden layers and one output unit (64 units in the first layer and 48 units in the second layer). The activation function used in hidden layers is called Rectified Linear Unit which has the form of f(x)=max(0, x). There is no activation function for the output unit of critic. Hyperbolic tangent function (tan h) is used in the output unit of actor to bound the output with range (1, +1). Adam optimizer is used for the learning of parameters in the two neural networks with a learning rate of 10.sup.4 and 10.sup.3 for the actor and critic respectively. The neural networks are trained on 52 historical delivery trips with a distance range of 39 to 56 miles for 800 epochs (M=800 and N=52 in the method shown in
[0111]
Testing
[0112] To evaluate the performance of the trained DDPG solution, it was tested on 51 delivery trips. The distance range of the test trips was from 31 to 54 miles.
[0113] The resulting neural network is robost and can handle conditions that were not seen during training. For example, the trips in the training data all began with 100% state of charge. However, the neural network is able to handle a lower initial state of charge as shown in
[0114] The two embodiments described above can be implemented in a system such as the system shown in
[0115] Processor 1504 uses the value of returned by cloud servers 1510 to calculate SOC.sub.ref and then compares the current state of charge to SOC.sub.ref. When the current state of charge is less than SOC.sub.ref, processor 1504 issues an instruction to provide additional electrical energy to the battery either by instructing the internal combustion engine to start or by sending a notification to recharge the battery using an external charger.
[0116] While the embodiments above are discussed in connection with delivery vehicles, the embodiments are applicable to other types of vehicles including commuter vehicles, personal vehicles that follow similar routes on different days, waste disposal vehicles, and police vehicles, for example. In addition, although the embodiments have been described with reference to range extended hybrid electric vehicles, the embodiments can also be applied to multi-mode pure electric vehicles where the vehicle enters different powertrain operation modes that consumes less charge (limited acceleration and/or speed) in order to stay above the SOC reference or that limit power used by heating/air conditioning units and internal power outlets, for example. The embodiments can also be used in a parallel hybrid vehicle where the combustion engine is used to drive the powertrain so that the SOC is held constant and above the SOC reference. When the SOC reference is sufficiently below the SOC, the electric motor is engaged and the combustion engine is turned off.
[0117]
[0118] Embodiments of the present invention can be applied in the context of computer systems other than computing device 10. Other appropriate computer systems include handheld devices, multi-processor systems, various consumer electronic devices, mainframe computers, and the like. Those skilled in the art will also appreciate that embodiments can also be applied within computer systems wherein tasks are performed by remote processing devices that are linked through a communications network (e.g., communication utilizing Internet or web-based software systems). For example, program modules may be located in either local or remote memory storage devices or simultaneously in both local and remote memory storage devices. Similarly, any storage of data associated with embodiments of the present invention may be accomplished utilizing either local or remote storage devices, or simultaneously utilizing both local and remote storage devices.
[0119] Computing device 10 further includes an optional hard disc drive 24, an optional external memory device 28, and an optional optical disc drive 30. External memory device 28 can include an external disc drive or solid state memory that may be attached to computing device 10 through an interface such as Universal Serial Bus interface 34, which is connected to system bus 16. Optical disc drive 30 can illustratively be utilized for reading data from (or writing data to) optical media, such as a CD-ROM disc 32. Hard disc drive 24 and optical disc drive 30 are connected to the system bus 16 by a hard disc drive interface 32 and an optical disc drive interface 36, respectively. The drives and external memory devices and their associated computer-readable media provide nonvolatile storage media for the computing device 10 on which computer-executable instructions and computer-readable data structures may be stored. Other types of media that are readable by a computer may also be used in the exemplary operation environment.
[0120] A number of program modules may be stored in the drives and RAM 20, including an operating system 38, one or more application programs 40, other program modules 42 and program data 44. In particular, application programs 40 can include programs for implementing any one of modules discussed above. Program data 44 may include any data used by the systems and methods discussed above.
[0121] Processing unit 12, also referred to as a processor, executes programs in system memory 14 and solid state memory 25 to perform the methods described above.
[0122] Input devices including a keyboard 63 and a mouse 65 are optionally connected to system bus 16 through an Input/Output interface 46 that is coupled to system bus 16. Monitor or display 48 is connected to the system bus 16 through a video adapter 50 and provides graphical images to users. Other peripheral output devices (e.g., speakers or printers) could also be included but have not been illustrated. In accordance with some embodiments, monitor 48 comprises a touch screen that both displays input and provides locations on the screen where the user is contacting the screen.
[0123] The computing device 10 may operate in a network environment utilizing connections to one or more remote computers, such as a remote computer 52. The remote computer 52 may be a server, a router, a peer device, or other common network node. Remote computer 52 may include many or all of the features and elements described in relation to computing device 10, although only a memory storage device 54 has been illustrated in
[0124] The computing device 10 is connected to the LAN 56 through a network interface 60. The computing device 10 is also connected to WAN 58 and includes a modem 62 for establishing communications over the WAN 58. The modem 62, which may be internal or external, is connected to the system bus 16 via the I/O interface 46.
[0125] In a networked environment, program modules depicted relative to the computing device 10, or portions thereof, may be stored in the remote memory storage device 54. For example, application programs may be stored utilizing memory storage device 54. In addition, data associated with an application program may illustratively be stored within memory storage device 54. It will be appreciated that the network connections shown in
[0126] Although elements have been shown or described as separate embodiments above, portions of each embodiment may be combined with all or part of other embodiments described above.
[0127] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms for implementing the claims.