Energy optimization management of microgrid using improved soft actor-critic algorithm

. To tackle the challenges associated with variability and uncertainty in distributed power generation, as well as the complexities of solving high-dimensional energy management mathematical models in mi-crogrid energy optimization, a microgrid energy optimization management method is proposed based on an improved soft actor-critic algorithm. In the proposed method, the improved soft actor-critic algorithm employs an entropy-based objective function to encourage target exploration without assigning signifi-cantly higher probabilities to any part of the action space, which can simplify the analysis process of distributed power generation variability and uncertainty while effectively mitigating the convergence fragility issues in solving the high-dimensional mathematical model of microgrid energy management. The effectiveness of the proposed method is validated through a case study analysis of microgrid energy op-timization management. The results revealed an increase of 51.20%, 52.38%, 13.43%, 16.50%, 58.26%, and 36.33% in the total profits of a microgrid compared with the Deep Q-network algorithm, the state-action-reward-state-action algorithm, the proximal policy optimization algorithm, the ant-colony based algorithm, a microgrid energy optimization management strategy based on the genetic algorithm and the fuzzy inference system, and the theoretical retailer stragety, respectively. Additionally, com-pared with other methods and strategies, the proposed method can learn more optimal microgrid energy management behaviors and anticipate fluctuations in electricity prices and demand.


Introduction
Energy supply and management have always been one of the significant challenges faced by modern society.With the rapid development of renewable energy and the widespread adoption of distributed energy sources, microgrids (MGs) has garnered extensive attention as an emerging paradigm in energy systems (Saeed et al, 2023;Kheiter et al, 2022).MGs, often incorporating renewable energy sources and energy storage, have emerged as viable solutions to address the chal-lenges of conventional centralized power systems (Ahmad et al, 2023;Mostefa et al, 2023).MGs, decentralized energy systems that can operate independently or in conjunction with the main grid, also offer a promising avenue for enhancing energy efficiency, resilience, and sustainability, which makes them well-suited for supporting peak load management on the main power grid (Mostefa et al, 2023;Lagouir et al, 2021).
Efficient energy management is critical for the seamless functioning of MGs, especially in the context of increasing renewable energy penetration (Alzahrani et al, 2023;Shezan et al, 2023).Energy optimization not only ensures the reliable and stable operation of MGs but also contributes to cost reduction, environmental sustainability, and grid resilience.In traditional power systems, centralized energy management and control strategies are typically sufficient to meet the re-quirements for their efficient operation (Tajjour et al, 2023;Dey et al, 2023).However, in MGs, due to the constraints imposed by diverse energy resources and constantly changing power demands, traditional energy optimization manage-ment methods are no longer applicable (Alamir et al, 2023;Zhang et al, 2023).Therefore, as the complexity of MG sys-tems grows, the need for intelligent and adaptive algorithms becomes paramount to address the intricacies associated with energy generation, storage, and consumption (Jahani et al, 2023;El Bourakadi et al, 2022).Hou et al. (2023) focused on a proactive energy management and collaborative optimization method using mul-ti-stakeholder game used for a community MGs with multiple renewable energy sources.Simulation results demonstrated that this approach can effectively enhance the economic viability of electricity consumption for users and the on-site in-tegration capacity of renewable energy.Gao et al. (2023) developed a remote island MG energy management strategy based on master-slave games.The economic and practical efficiency of this strategy was validated through simulation examples.Leonori et al. (2020) introduced a MG energy optimization management strategy based on genetic algorithms.The effectiveness of the model and algorithm was verified through a small-scale MG case study.The case study results indicated that the model achieved optimal energy management for the MG.

Research Article
| 330 ISSN: 2252-4940/© 2024.The Author(s).Published by CBIORE Datta et al. (2023) presented an energy man-agement scheme for multi-MG systems, encompassing distributed resource scheduling, renewable energy integration, and plug-in electric vehicle penetration.The presented method was validated on a test system comprising residential, com-mercial, and industrial MGs, with simulation results confirming its effectiveness.Alamir et al. (2023b) developed an improved version of the artificial rabbit optimization algorithm, enhanced with quantum mechanics and the Monte Carlo method for optimal MG energy scheduling.Simulation results affirmed the effectiveness of the developed approach in reducing operation costs and maximizing energy usage efficiency.Rodriguez et al. (2023) outlined a MG energy man-agement system utilizing fuzzy logic control.Simulation results demonstrated that the developed method efficiently harnessed available solar energy resources while extending batteries' life through the efficiently control of their over-charging and deep discharging.
While the aforementioned studies have achieved commendable results in MG energy management, they necessitate a comprehensive understanding of the dynamic current and subsequent states of the MG.In recent years, reinforcement learning (RL) has garnered extensive research interest as a machine learning method and has shown remarkable success in addressing complex control issues (Ibrahim et al, 2023;Cavus et al, 2023).The soft actor-critic (SAC) algorithm, rooted in RL, has demonstrated remarkable success in optimizing complex systems (Kim et al, 2023;Dong et al, 2024).Originally designed for robotic control tasks, SAC has shown versatility and adaptability, making it an attractive candidate for ap-plication in MG energy management (Bao et al, 2023;Topa et al, 2021).
To address the challenges present in the aforementioned researches, this paper introduces an energy optimization management approach based on an improved SAC algorithm.The main contributions of this research is summarized as follows.By incorporating entropy into the objective func-tion, the algorithm strikes a delicate balance between exploration and exploitation.Unlike traditional methods that assign high probabilities to certain actions, the entropy-based approach maintains a level of uncertainty, allowing the algorithm to explore a broader action range.This innovative strategy enhances the adaptability of the improved SAC algorithm, making it particularly well-suited for dynamic environments where the optimal action may change over time.The incorporation of entropy not only fosters exploration but also contributes to the robustness and versatility of the algorithm in real-world applications.Furthermore, the significant breakthrough of the proposed method also lies in its ability to effectively address the challenges associated with MG energy management.The inherent variability and uncertainty of DGs pose formidable obstacles to achieving reliable and efficient energy management.The conventional mathematical models used in MG energy management often suffer from convergence brittleness, making them less adaptive to dynamic operating conditions.Through the application of the proposed method, these challenges are mitigated.The algorithm demonstrates a remarkable capacity to handle variability and uncertainty, providing a more resilient and robust solution to the intricacies of MG energy management.This breakthrough not only improves the stability of MG operations but also opens up new possibil-ities for the integration of renewable energy sources.In addition, in the realm of electricity price prediction for MGs, the proposed method presents a noteworthy departure from conventional approaches.Unlike methods that require an exhaus-tive understanding of MG dynamics for accurate predictions, the proposed approach takes a more streamlined and efficient route.The necessity for a comprehensive solution to MG dynamics is alleviated, streamlining the prediction process.This not only reduces the computational complexity associated with forecasting purchase and sale prices but also enhances the applicability of the method to a wider range of scenarios.By focusing on key features and leveraging the inherent adapt-ability of the SAC algorithm, the proposed approach achieves accurate and efficient electricity price predictions without the need for an overly intricate understanding of MG dynamics.This innovation marks a significant advancement in the field of MGd energy economics, making it more accessible and practical for real-world implementation.
The aim of this research is to effectively optimize energy management within MGs and enhance the reliability and economic efficiency of MG energy management, during tackling the challenges associated with variability and uncertainty in distributed power generation, as well as the complexities of solving high-dimensional energy management mathematical models in microgrid energy optimization.

Microgrid architecture and configuration
The schematic diagram of a MG under investigation, situated in the local area, is depicted in Fig. 1.This MG configuration encompasses several integral components:  The analyzed MG in Figure 1 is interconnected with various DGs, including a diesel generator, four photovoltaic generator systems (PVs), four wind turbines (WTs), and three battery energy storage systems (BESSs).In Fig. 1, a single diesel generator with a maximum capacity of 20 kW is considered in the energy optimization model of MG.The parameter values associated with DGs are presented in Table 1.

Energy network components
The MG is interconnected with the distribution network (DN), facilitating the continuous exchange of electrical energy within the electricity market through the distribution network.Effective bidirectional communication, established between each component within the MG and the energy management system (EMS) is utilized to enable the exchange of vital information, including electricity pricing, battery charging/discharging status, and power generation statistic data (Zhang et al, 2021).The EMS efficiently utilizes smart agents to relay control signals to the various components, manage on/off operations for TCLs, charge/discharge procedures for ESSs, and send or receive control signals of purchasing/selling energy from/for the DN.

Energy storage systems
ESSs play a pivotal role in modern energy management, providing a mean to store and release energy when needed.These systems contribute to grid stability, enable the integration of renewable energy sources, and enhance the overall reliability of energy supply.The ESSs studied in the MG under investigation refer to the community-level ESSs rather than individual residential battery ESSs.The capacity of an ESS utilized is designed to meet a minimum of 2 hours' worth of energy demands for all users within the MG.At each time step , the dynamic variation in energy within an ESS is modeled as follows (2) ESSs examine their discharging conditions and release available electrical energy.If the ESSs cannot fully supply the requested electrical energy as per EMS's request, the remaining amount is automatically provided by the DN.

Distributed generators
DGs are small-scale power generation sources that are decentralized and located close to the end-users or within the DN.Unlike traditional centralized power plants, DGs contribute to local generation and can operate independently or in conjunction with the main grid.These systems offer various benefits, including improved grid resilience, increased energy efficiency, and support for renewable energy integration.The MG is equipped with DGs capable of producing varying amounts of electrical energy based on weather conditions.The utilized DGs does not rely on a generative model analysis but rather employs actual distributed generation data, such as wind power data obtained from NREL.DGs share real-time generation data with EMS (  ) and can directly supply power to the local grid.

Distribution network
When DGs' supply falls shortly, the DN can promptly provide electrical energy to the MG.When there is an excess of electrical energy, the DN can also accept surplus electrical energy from the MG.The energy transactions between the DN and DGs are carried out in real-time by adjusting the fluctuations in electrical energy pricing in the electricity market, where the upward and downward price adjustments are denoted as    and    , respectively.To establish the priority power source during electricity supply shortages and the priority discharge source during electricity surpluses, EMS exclusively manages the electrical switches of the DN.Following , the EMS of a DG receives information regarding the amount of electricity (  ) to be purchased or sold to the DN.A positive value of   indicates purchasing electrical energy from the market, while a negative value of   signifies selling electrical energy to the market.

Temperature control loads
TCLs refer to systems or devices that are designed to regulate and maintain a specific temperature within a given space.These loads are prevalent in various sectors, including residential, commercial, and industrial settings, and they play a crucial role in ensuring comfort, preserving products, and supporting industrial processes.TCLs adhere to the principle of heat conservation.TCLs can serve as a significant source of flexibility in energy provision.TCLs can be directly controlled at  by the agents, and the energy cost incurred by users to maintain indoor comfort is accounted for  gen .The controller for TCLs receives on/off operation signals from the agents.The TCL controllers perform on/off operations by examining temperature constraints and based on the following conditions.
where  ,  represents the on/off operation command given by a TCL controller.   denotes on/off operation command when is the operating temperature of the analyzed TCL (heat pump, water heater, refrigerator, etc.)  at the time step , or the indoor temperature in the room where the TCL (air conditioner)  is located at the time step .   and    respectively stand for the maximum and minimum temperatures determined by users.The research conducted primarily focuses on air conditioner-type TCLs, and the dynamic temperature variation process in the room where the analyzed TCL is located is modeled as follows: where    represents the indoor temperature. ,  is the temperature of the TCL's output air,   0 is the outdoor temperature,    and    represent the thermal mass of the air and building materials, respectively.  denotes an uncontrollable heat load within the analyzed building. tcl  represents the TCL's rated output power.

Residential loads
The analyzed residential loads represent household electricity demands that cannot be directly controlled within the MG.The analyzed residential loads exhibit daily fluctuations, with their variable energy consumption influenced by electricity pricing.The modeling of electricity load    in household  over time step  is as follows: b, SL, PB, where  , >0 represents the daily baseline load power (Nakabi et al, 2019).The sensitivity parameter   (where   ∈[0,1]) indicates the proportion of load power variation relative to the price fluctuation magnitude. SL,  is the transferred loads, and   is the change in electricity pricing at the time step  compared to the previous moment. SL,  is a positive value for high electricity prices (  >0) and a negative value for low electricity prices (  <0). PB,  corresponds to the load amount transferred from the previous time period to the current one.For a given time, positively transferred load charges must operate after a certain time delay, while negatively transferred load charges will be retained in the subsequent time steps.

Energy management objective
The goal of energy optimization management in MGs is to identify the optimal strategy  * that satisfies the following conditions.

 
where   represents the expected total profit for MG under a given control strategy  * .{  |   } denotes the average value of   under a certain state   .
In the training process of a RL algorithm, four elements are primarily employed: the state space , the action space , the state transition probabilities , and the value function  (Homod et al, 2022;Moos et al, 2022).During the learning process, the agent interacts with the system and an operation is carried out at   ∈  ⊆ ℝ   , where   represents the action executed at time step .Subsequently, the system is transitioned from   ∈  ⊆ ℝ   to its subsequent state  +1 (Du et al, 2022;Shavandi et al, 2022).
Assuming for a given strategy , the value function over a certain time interval (  (  )) depends on the sequence of state transitions ((, ,  ′ ) = ( +1 =  ′ |   = ,   = )).The discounted sum of future profits generated is represented as (Singh et al, 2023;Tian et al, 2023) where  ∈ (0,1] is the discount factor that weights future profits. * (  ) is used to represent the maximum discounted profit that the agent obtains when execute the optimal strategy starting from a state   .From this, it can be seen that (Alavizadeh et al, 2022;Kosuru et al, 2022) ( ) Meanwhile,  expresses the expected returns of choosing an action to be operated in   and following the subsequent decision  thereafter.

The optimization technique
The soft actor-critic (SAC) algorithm, based on the maximum entropy RL framework, is considered as a non-policy algorithm (Sun et al, 2022).It is capable of handling continuous action spaces, thus enhancing its applicability to various control problems.The action-critic architecture is employed by SAC (Zheng et al, 2023) .
The action-critic architecture employs two distinct deep neural networks to approximate the function  and the state value function  (Zheng et al, 2023) .The actor maps the current state to what it deems the optimal action, while the critic evaluates actions by computing the value function.The SAC algorithm operates within the maximum entropy RL framework, aiming to maximize the expected reward and entropy, which is to where   represents the Shannon entropy term, indicating the agent's uncertainty when taking random actions. is the regularization coefficient, signifying the importance of the entropy term on the rewards.Generally, when considering traditional reinforcement learning algorithms,  is set to zero.Maximizing this objective function ensures that the agent is explicitly encouraged to explore new strategies while also preventing it from making suboptimal actions (Xu et al, 2022).
In the context of the studied method for performing energy optimization in MGs, SAC algorithm primarily utilizes three network functions.These are the state value function  parameterized by , the smoothing function  parameterized by , and the policy function  parameterized by  (Huang et al, 2021).
Firstly, the state value function is trained by minimizing the following error (Tightiz et al, 2023).That is, the training of the state value function V involves minimizing the squared difference between the predicted value of the function  and its expected predicted value, along with maximizing the entropy of the policy function  (measured by the negative logarithm of the policy function) to the greatest extent possible.To train the approximate function for the policy , the following error is minimized (Hu et al, 2022).
where the term  KL is the Kullback-Leibler divergence.By seeking the minimum value of the above objective function, it ensures that the policy function's distribution becomes more similar to the exponential distribution form of normalized function  by normalization function .
To improve the performance of the SAC algorithm and minimize this objective, reparameterization is employed using   =   (  ;   ).This technique is used to ensure that the policy sampling process is a differentiable one.The parameterized policy can be represented as follows (Xiong et al, 2023): Since the normalization function  is independent of the parameter , it is discarded.The unbiased estimate of the gradient for the aforementioned objective is as follows:

Simulation test configuration
The proposed energy management method was implemented using the OpenAI Gym tool in a MG operational environment described in the second section (Huang et al, 2022).The MG energy management process was run for a total of 10 days within the constructed MG operational environment.This process was divided into multiple stages, with each stage lasting for one day.During each stage, one day was randomly selected from the dataset of 10 days.Each time step  represents an hour, resulting in 24 time steps per day.
Initially, the current state of the MG is determined based on the average power consumption of the TCLs ( SoC ), the charging/discharging status ( BSC, ) of the ESSs, the utility meter state (   ), temperature (   ), generated power (  ), electricity price (  ) in the power market, time step , and the current load value ( , ).These variables are organized using a vector representation.
where    and    are the up-and down-regulation rates respectively, which are the rates for selling electrical energy to and buying electrical energy from the DN, respectively.  ,    , and    represent the amount of electrical energy generated, sold to and purchased from the DN, respectively. gen is the energy generation cost charged to customers with controlled loads. ,  is the binary variable corresponding to the on or off actions for the TCLs. tr imp and  tr exp are the costs associated with importing and exporting electrical energy to the distribution network, respectively.
To evaluate the effectiveness of the proposed method, it was compared with two energy management methods: i) Strategy 1: A MG energy optimization management strategy based on the genetic algorithm and the fuzzy inference system (Leonori et al, 2020).Leonori et al. (2020) proved that this strategy is capable of obtaining optimal information regarding electricity generation, electricity consumption, electricity prices, and temperature.The model of Strategy 1 involves the integration of a Microgrid (MG) energy optimization management strategy utilizing both genetic algorithms and fuzzy inference systems (FIS).
Here's a breakdown of the model: 1) Hierarchical Genetic Algorithm (GA) and Fuzzy Inference System (FIS): The Energy Management System (EMS) of Strategy 1 is synthesized through a hierarchical genetic algorithm and fuzzy inference system (FIS).This hybrid approach combines the optimization capabilities of genetic algorithms with the decision-making flexibility of fuzzy logic, 2) Fuzzy Inference System (FIS) Design: The FIS is responsible for defining the consequent part of each rule, tuning the membership functions (MFs) position and shape, setting the rule weights, and eliminating input MFs deemed ineffective.This ensures that the FIS accurately captures the system's behavior and effectively guides decision-making, 3) Optimization Parameters Setting: Once the EMS of Strategy 1 is designed and its optimization parameters are set, the model proceeds to implement various strategies for tuning the FIS parameters.This iterative process allows for refinement and optimization of the FIS to enhance its performance in managing MG energy, 4) Tuning FIS Parameters: Different strategies are employed to tune the FIS parameters, which may include adjusting the shape and position of membership functions, optimizing rule weights, and removing redundant input membership functions.These strategies aim to improve the accuracy and efficiency of the FIS in making energy management decisions.In summary, the model utilizes a hybrid approach combining genetic algorithms and fuzzy inference systems to develop an Energy Management System for a Microgrid.The FIS plays a central role in decision-making, with its parameters optimized through iterative tuning processes to enhance the overall performance of the energy optimization strategy.ii) Strategy 2: Theoretical retailers.These retailers purchase an exact amount of electrical energy in a day-ahead electricity market and sell it to a simulated user group at market prices.The model of strategy 2 involves several key components: 1) Theoretical Retailers: The model features theoretical retailers as virtual entities within the simulation framework.These retailers do not physically exist but are instead conceptual representations used to simulate the behavior of actual market participants, 2) Procurement Process: Within the model, theoretical retailers engage in the process of purchasing a predetermined quantity of electrical energy.This procurement occurs in the day- ahead electricity market, where transactions for future delivery are conducted based on forecasted demand and supply conditions, 3) Exact Quantity Procurement: The model specifies that theoretical retailers procure an exact amount of electrical energy.This suggests a precise allocation of resources, indicating a predetermined quantity agreed upon by the retailers based on their anticipated demand or contractual obligations, 4) Day-Ahead Electricity Market: The procurement process takes place within the day-ahead electricity market, a segment of the wholesale electricity market where buyers and sellers trade electricity contracts for delivery on the following day.This market allows participants to plan and secure their energy needs in advance, 5) Energy Distribution to Simulated User Group: After procuring electrical energy, theoretical retailers distribute it to a simulated user group.This user group represents consumers or end-users within the simulation environment and serves as the recipient of the electricity supplied by the retailers, 6) Pricing Mechanism: The electrical energy is sold to the simulated user group at market prices.This implies that the pricing mechanism within the simulation model is based on prevailing market conditions, reflecting the supply-demand dynamics and pricing signals observed in real-world electricity markets, 7) Overall, the model simulates the behavior of theoretical retailers in procuring electrical energy from the day-ahead electricity market and subsequently distributing it to a simulated user group, all while adhering to market prices.
These comparisons were made to assess the performance of the proposed method relative to these alternative energy management strategies.To compare the performance of the proposed method with other DL approaches, the energy management results of the proposed method were also compared with two other DL algorithms utilized for energy optimization management of MG: the Deep Q-network (DQN) algorithm (Alabdullah et al, 2022) and the state-action-rewardstate-action (SARSA) algorithm (Nakabi et al, 2021).This comparison was conducted to evaluate how the proposed method fares relative to these other DL algorithms in MG energy management.To further verify the effectiveness of the proposed method, the energy management results of the proposed method were compared with two latest energy optimization management methods for MGs: the proximal policy optimization algorithm (PPO) (Guo et al, 2022) and the ant-colony based algorithm (ACO) (Suresh et al, 2023).
The proposed energy optimization management method for MGs is simulated, tested, and analyzed according to the defined energy management objective.The data used to verify the effectiveness of the proposed method is collecting from a MG power by a subsidiary of the Southern Power Grid Company.The data collected from June 1 2023 to July 30 2023.The simulation test platform used is a computer equipped with an Intel Core i5@2.30GHzprocessor, 8GB RAM, and 1TB hard disk space.The analysis of the designed energy optimization management model is performed using MATLAB software, with Gurobi selected as the solver.To obtain average values for the test results, each simulation test is repeated 30 times and each result represents the average of these repeated tests.

Total profit analysis
The proposed method, DQN, SARSA, Strategy 1, and Strategy 2 were tested on the MG depicted in Fig. 1, and the daily average profit data for 10 days, from day 50 to day 59, were recorded.The comparative analysis of estimated daily earnings from employing various energy management methods over a span of ten consecutive days is depicted in Fig. 2. The total profit histograms and statistical values for 10 days for the proposed method, DQN, SARSA, Strategy 1, and Strategy 2 are shown in Fig. 3, respectively.
As presented in Fig. 3, a visual representation highlights the noteworthy success of the proposed method in terms of average profitability when compared to prominent alternatives such as DQN, SARSA, Strategy 1, and even the retailer strategy (Strategy 2).The compelling evidence from this figure underscores the superior financial outcomes achieved through the application of the proposed method.
Notably, the proposed method consistently outshines its counterparts, namely DQN, SARSA, Strategy 1, and Strategy 2, showcasing its sustained advantage in profitability.This temporal perspective provides a robust understanding of the method's performance over an extended period.While Strategy 1 exhibits commendable results in the domain of MG energy management, its efficacy is contingent upon a comprehensive comprehension of both current and subsequent states within the network dynamics.This nuanced requirement sets it apart as a strategy demanding a deeper understanding of the intricacies associated with the MG, underlining the unique strengths of the proposed method in offering superior profitability without the need for an exhaustive understanding of the network dynamics.
As shown in Fig. 3, only the total profit and the daily benefits of strategy 1 is close to the proposed method.In order to further analyze the advantages of the proposed method, in the following analysis, the following MG energy optimization management effects of the proposed method is only compared with that of Strategy 1.

TCL energy distribution and ESS charging status
In Fig. 4(a) and Fig. 4(b), the dynamic interplay between TCL energy distribution and the status of ESSs during charging and discharging cycles is meticulously illustrated for both the proposed method and Strategy 1.These visual representations provide valuable insights into the operational intricacies of the MG under the influence of these two distinct energy management approaches.Examining Fig. 4, a striking resemblance in behaviors becomes apparent on the 50th day of the MG energy management simulation.This noteworthy observation underscores the similarity in outcomes between the proposed method and Strategy 1.The synchronized patterns in TCL energy distribution and ESS charging/discharging status hint at the comparable effectiveness of both strategies in optimizing MG operations on this specific day.This convergence of results highlights a key aspect of the proposed method -its capacity to achieve performance akin to the more complex Strategy 1.This similarity in behavior signifies the robustness of the proposed method in delivering outcomes on par with a strategy that, as mentioned earlier, necessitates a comprehensive understanding of the network dynamics.The visual evidence in Fig. 4 serves as a compelling testament to the efficacy and reliability of the proposed method in MG energy management.
The nuanced disparity between the utilization of the proposed method and Strategy 1 becomes apparent in their respective energy allocation strategies.Notably, the proposed method strategically directs a greater accumulation of energy within the TCLs, emphasizing its commitment to leveraging this resource for optimized MG performance.In contrast, Strategy 1 adopts a divergent approach, allocating a larger share of the available energy to the ESSs, indicating a preference for storing energy within these systems.
Furthermore, a shared characteristic emerges between the proposed method and Strategy 1.Both strategies exhibit a proactive stance in deciding to store a substantial amount of electrical energy during periods characterized by abundant renewable energy generation and a lack of peak demand.This commonality underscores the importance of aligning energy storage decisions with favorable conditions, showcasing a strategic awareness of optimizing resource utilization.
In essence, the proposed method's emphasis on TCL energy accumulation and Strategy 1's inclination towards ESS energy allocation highlight the distinct yet equally effective paths these strategies take in responding to the dynamic energy landscape.The shared commitment to capitalize on renewable energy surpluses accentuates the sophistication embedded in their decision-making processes, contributing to the overall efficacy of MG energy management.

The complexities of electricity prices and demand dynamics
Fig. 5 provides a visual narrative of the electricity transactions with the DN under both Strategy 1 and the proposed method.The power curve of renewable energy generation, showcasing the potential for both storage and sale to the DN, is also depicted in this comprehensive illustration.
A discernible trend emerges from the analysis of Fig. 5: the proposed method exhibits superior performance in energy scheduling compared to Strategy 1.This is evident even on days marked by substantial fluctuations in electricity generation and consumption.The visual representation underscores the capability of the proposed method to adapt and optimize MG energy management behaviors, outshining Strategy 1 in its capacity to navigate the complexities of electricity prices and demand dynamics.The significance of this superiority lies in the proposed method's ability to learn and anticipate fluctuations in electricity prices and demand.By doing so, it positions itself as a more adaptive and forward-thinking solution for MG energy optimization.This adaptability is crucial in real-world scenarios where the energy landscape is characterized by variability, allowing the proposed method to make more informed and strategic decisions in response to changing conditions.Fig. 5 serves as compelling visual evidence of the proposed method's prowess in achieving optimal energy scheduling outcomes, even in the face of challenging fluctuations in the MG's energy dynamics.
Indeed, In the intricate landscape of MG energy optimization management, Strategy 1 distinguishes itself by relying on a comprehensive understanding of the MG dynamics.This thorough comprehension empowers Strategy 1 to delve into the intricate details, predicting not only the purchase and sale prices of electricity but also the availability of these critical resources.This requirement for an extensive solution underscores the complexity and depth of Strategy 1's decisionmaking process, as it navigates the nuances of market dynamics and resource availability.
In stark contrast, the proposed method takes a more streamlined approach by operating without these constraints.Unlike Strategy 1, the proposed method bypasses the need for an exhaustive solution to MG dynamics.This divergence in approach positions the proposed method as a more agile and adaptable solution, capable of optimizing energy management without the burden of predicting market prices or resource availability with the same level of granularity.
This distinction not only highlights the operational variance between the two strategies but also underscores a key strength of the proposed method.Its ability to operate efficiently without the need for detailed predictions of market dynamics allows for a more flexible and practical application in real-world scenarios, where uncertainties and fluctuations are inherent.This characteristic sets the proposed method apart as a pragmatic and effective tool for MG optimization.

Energy optimization effect comparison with two latest research works
The proposed method, PPO, and ACO were tested on the MG depicted in Fig. 1, and the daily average profit data for 10 days, from day 50 to day 59, were recorded.The total profit histograms and statistical values for 10 days for the proposed method, PPO, and ACO are presented in Tab. 2. The PPO utilizes historical data on energy consumption and renewable energy generation to continuously update the network and learn an optimal policy.To ensure efficient and stable action selection, a clipped surrogate loss function is employed in PPO.The ACO draws inspiration from ant foraging behavior to effectively explore the solution space and identify optimal configurations for energy distribution within a microgrid system.Through iterative exploration, solution construction, and pheromonebased communication, ACO facilitates the discovery of highquality solutions that maximize energy efficiency while meeting predefined optimization objectives.
The analysis presented in Table 2 unveils a significant breakthrough achieved by the proposed method, particularly evident in its substantial improvement in average profitability when contrasted with both PPO and ACO methodologies.This observation not only highlights the effectiveness of the proposed approach but also accentuates its capacity to deliver superior financial results in comparison to established techniques.By quantifying and juxtaposing the profitability metrics in a structured format, Table 2 offers a clear and objective assessment of the proposed method's performance, serving as a cornerstone for deeper analysis and interpretation.Electrical energy interacting with the distribution network on day 50 when the proposed method is adopted; (b) Electrical energy interacting with the distribution network on day 50 when strategy 1 is adopted; (c) Electrical energy interacting with the distribution network on day 56 when the proposed method is adopted; (d) Electrical energy interacting with the distribution network on day 56 when strategy 1 is adopted.Moreover, the insights gleaned from Table 2 provide compelling evidence that further reinforces the superior financial outcomes achieved through the application of the proposed method.The meticulous examination of the data presented in the table sheds light on the magnitude of the improvement in profitability, underscoring the tangible benefits brought about by adopting the proposed approach.This evidence not only bolsters the credibility of the findings but also serves to deepen our understanding of the underlying mechanisms driving the observed results.
The combination of the findings from Table 2 and the subsequent analysis offers a comprehensive perspective on the remarkable success of the proposed method in enhancing profitability.By providing empirical evidence of its superiority over established methodologies such as PPO and ACO, the analysis reaffirms the efficacy of the proposed approach in achieving superior financial outcomes.This in-depth exploration not only enriches our understanding of the observed trends but also underscores the transformative potential of the proposed method in optimizing financial performance within the specified context.In summary, the proposed method consistently outshines its counterparts, namely DQN, SARSA, Strategy 1, Strategy 2, PPO, and ACO, as shown in Fig. 3 and Table 2.

Conclusions
A MG energy optimization management method is proposed based on an improved soft actor-critic (SAC) algorithm.To validate the effectiveness of the proposed approach, a comparison was conducted between the MG energy optimization management results achieved by the proposed method and those obtained through other energy optimization management methods or strategies.The comparative results reveal that the average profitability of the MG when implementing the proposed method surpasses those of the Deep Q-network DQN algorithm, the State-action-reward-state-action algorithm, the retailer strategy, the genetic algorithm, the proximal policy optimization algorithm, and the ant-colony based algorithm.Additionally, when employing the proposed method, the MG exhibits superior average profitability over a span of 10 days in comparison to other methods strategies.Although MG energy management performs well when Strategy 1 is employed, it necessitates a thorough understanding of the current and subsequent network dynamics.Furthermore, in the MG energy management simulation on the 50th day, similar behaviors are observed between the results of MG energy optimization management when the proposed method and Strategy 1 are employed.However, Strategy 1 necessitates a comprehensive solution of the MG's dynamics, enabling it to predict the purchase and sale prices of electricity, as well as the availability of electrical energy, while the proposed method operates without such constraints.2022).An ecodriving algorithm for trains through distributing energy: A Q-Learning approach.ISA transactions, 122, 24-37.https://doi.org/10.1016/j.isatra.2021.04.036

Fig. 5
Fig.5 Electrical energy interacted with the main network.(a) Electrical energy interacting with the distribution network on day 50 when the proposed method is adopted; (b) Electrical energy interacting with the distribution network on day 50 when strategy 1 is adopted; (c) Electrical energy interacting with the distribution network on day 56 when the proposed method is adopted; (d) Electrical energy interacting with the distribution network on day 56 when strategy 1 is adopted.

Table 1
Parameters of DGs.
ISSN: 2252-4940/© 2024.The Author(s).Published by CBIORE Daily benefits from running different energy management methods or strategies.