Fitting Lognormal PDFs: 2 Modes¶
This notebook demonstrates how to use a neural network to obtain an initial guess for fitting a lognormal probability density function (PDF) with two modes. It also shows how to optimize the initial guess using a cost function and visualize the results.
import numpy as np
import matplotlib.pyplot as plt
from particula.particles.properties import lognormal_pdf_distribution
from particula_beta.data.process.ml_analysis import generate_and_train_2mode_sizer
Generate Simulated Data¶
We start by generating simulated lognormal data for two modes. The data includes some noise to simulate real-world conditions.
# Generate x-values (particle sizes)
x_values = np.logspace(1.5, 3, 150)
# Generate lognormal PDF with noise
concentration_pdf = lognormal_pdf_distribution(
x_values=x_values,
mode=np.array([80, 150]),
geometric_standard_deviation=np.array([1.2, 1.3]),
number_of_particles=np.array([200, 500]),
)
# Introduce noise to the data
concentration_pdf = concentration_pdf * np.random.uniform(
low=0.8, high=1.2, size=concentration_pdf.shape
)
# Plot the noisy data
plt.figure(figsize=(8, 6))
plt.plot(x_values, concentration_pdf, label="Noisy Data")
plt.xscale("log")
plt.xlabel("Particle Size")
plt.ylabel("Concentration PDF")
plt.title("Simulated Lognormal Data with Noise")
plt.legend()
plt.show()
Guess Using Neural Network¶
Next, we use a pre-trained neural network model to get an initial guess for the lognormal parameters (mode, geometric standard deviation, and number of particles).
# Get initial guess from the ML model
(
mode_values_guess,
geometric_standard_deviation_guess,
number_of_particles_guess,
) = generate_and_train_2mode_sizer.lognormal_2mode_ml_guess(
logspace_x=x_values,
concentration_pdf=concentration_pdf,
)
# Display the initial guess results
print("Initial Guess:")
print(f"Mode: {mode_values_guess}")
print(f"GSD: {geometric_standard_deviation_guess}")
print(f"Number of particles: {number_of_particles_guess}")
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[3], line 6 1 # Get initial guess from the ML model 2 ( 3 mode_values_guess, 4 geometric_standard_deviation_guess, 5 number_of_particles_guess, ----> 6 ) = generate_and_train_2mode_sizer.lognormal_2mode_ml_guess( 7 logspace_x=x_values, 8 concentration_pdf=concentration_pdf, 9 ) 11 # Display the initial guess results 12 print("Initial Guess:") File C:\Code\particula-beta\particula_beta\data\process\ml_analysis\generate_and_train_2mode_sizer.py:478, in lognormal_2mode_ml_guess(logspace_x, concentration_pdf) 476 folder_path = get_ml_analysis_folder() 477 load_path = os.path.join(folder_path, SAVE_NAME) --> 478 ml_pipeline = load_and_cache_pipeline(filename=load_path) 480 # Generate ml_x_values based on the range of x_values 481 ml_x_values = np.logspace( 482 start=np.log10(logspace_x.min()), 483 stop=np.log10(logspace_x.max()), 484 num=X_ARRAY_MAX_INDEX, 485 dtype=np.float64, 486 ) File C:\Code\particula-beta\particula_beta\data\process\ml_analysis\generate_and_train_2mode_sizer.py:381, in load_and_cache_pipeline(filename) 378 global _cached_pipeline # pylint: disable=global-statement 380 if _cached_pipeline is None: --> 381 _cached_pipeline = load_pipeline(filename=filename) 383 return _cached_pipeline File C:\Code\particula-beta\particula_beta\data\process\ml_analysis\generate_and_train_2mode_sizer.py:365, in load_pipeline(filename) 355 def load_pipeline(filename: str) -> Pipeline: 356 """ 357 Load a pipeline from a file. 358 (...) 363 The loaded pipeline. 364 """ --> 365 return joblib.load(filename) File c:\Users\253393\AppData\Local\miniconda3\envs\ParticulaBeta_p311\Lib\site-packages\joblib\numpy_pickle.py:658, in load(filename, mmap_mode) 652 if isinstance(fobj, str): 653 # if the returned file object is a string, this means we 654 # try to load a pickle file generated with an version of 655 # Joblib so we load it with joblib compatibility function. 656 return load_compatibility(fobj) --> 658 obj = _unpickle(fobj, filename, mmap_mode) 659 return obj File c:\Users\253393\AppData\Local\miniconda3\envs\ParticulaBeta_p311\Lib\site-packages\joblib\numpy_pickle.py:577, in _unpickle(fobj, filename, mmap_mode) 575 obj = None 576 try: --> 577 obj = unpickler.load() 578 if unpickler.compat_mode: 579 warnings.warn("The file '%s' has been generated with a " 580 "joblib version less than 0.10. " 581 "Please regenerate this pickle file." 582 % filename, 583 DeprecationWarning, stacklevel=3) File c:\Users\253393\AppData\Local\miniconda3\envs\ParticulaBeta_p311\Lib\pickle.py:1213, in _Unpickler.load(self) 1211 raise EOFError 1212 assert isinstance(key, bytes_types) -> 1213 dispatch[key[0]](self) 1214 except _Stop as stopinst: 1215 return stopinst.value File c:\Users\253393\AppData\Local\miniconda3\envs\ParticulaBeta_p311\Lib\pickle.py:1538, in _Unpickler.load_stack_global(self) 1536 if type(name) is not str or type(module) is not str: 1537 raise UnpicklingError("STACK_GLOBAL requires str") -> 1538 self.append(self.find_class(module, name)) File c:\Users\253393\AppData\Local\miniconda3\envs\ParticulaBeta_p311\Lib\pickle.py:1580, in _Unpickler.find_class(self, module, name) 1578 elif module in _compat_pickle.IMPORT_MAPPING: 1579 module = _compat_pickle.IMPORT_MAPPING[module] -> 1580 __import__(module, level=0) 1581 if self.proto >= 4: 1582 return _getattribute(sys.modules[module], name)[0] ModuleNotFoundError: No module named 'particula.data'
Optimize the Lognormal Fit¶
With the initial guess obtained, we now optimize the parameters using a cost function to minimize the difference between the actual data and the fitted lognormal distribution.
(
mode_values_optimized,
gsd_optimized,
number_of_particles_optimized,
r2_optimized,
optimization_results,
) = generate_and_train_2mode_sizer.optimize_lognormal_2mode(
mode_guess=mode_values_guess,
geometric_standard_deviation_guess=geometric_standard_deviation_guess,
number_of_particles_in_mode_guess=number_of_particles_guess,
x_values=x_values,
concentration_pdf=concentration_pdf,
)
# Display the optimized results
print("Optimized Fit:")
print(f"Optimized mode values: {mode_values_optimized}")
print(f"Optimized GSD: {gsd_optimized}")
print(f"Optimized number of particles: {number_of_particles_optimized}")
print(f"Optimized R²: {r2_optimized}")
print(f"Best optimization method: {optimization_results['method']}")
Optimized Fit: Optimized mode values: [ 77.92048882 146.4391948 ] Optimized GSD: [1.19264682 1.30742154] Optimized number of particles: [175.92002921 544.65291938] Optimized R²: 0.9736896055718199 Best optimization method: trust-constr
Visualization¶
Finally, we compare the original data, the initial guess from the neural network, and the optimized fit. This step helps us understand how close the initial guess was and how much improvement was achieved through optimization.
# Generate concentration PDFs for guess and optimized values
concentration_pdf_guess = lognormal_pdf_distribution(
x_values=x_values,
mode=mode_values_guess,
geometric_standard_deviation=geometric_standard_deviation_guess,
number_of_particles=number_of_particles_guess,
)
concentration_pdf_optimized = lognormal_pdf_distribution(
x_values=x_values,
mode=mode_values_optimized,
geometric_standard_deviation=gsd_optimized,
number_of_particles=number_of_particles_optimized,
)
# Plot the original, guess, and optimized PDFs
plt.figure(figsize=(10, 7))
plt.plot(x_values, concentration_pdf, label="Original Data", linewidth=2)
plt.plot(x_values, concentration_pdf_guess, label="ML Guess", linestyle="--", linewidth=3)
plt.plot(
x_values,
concentration_pdf_optimized,
label="Optimized Fit",
linestyle="-.",
linewidth=4,
)
plt.xscale("log")
plt.xlabel("Particle Size")
plt.ylabel("Concentration PDF")
plt.title("Comparison of Original, ML Guess, and Optimized Fit")
plt.legend()
plt.show()
Additional Fits¶
Let's try different initial conditions to see how the optimization behaves with varying data sets.
# Example
x_values_example1 = np.logspace(1.2, 4, 150)
concentration_pdf_example1 = lognormal_pdf_distribution(
x_values=x_values_example1,
mode=np.array([100, 800]),
geometric_standard_deviation=np.array([1.1, 1.1]),
number_of_particles=np.array([300, 800]),
)
# Add noise
concentration_pdf_example1 = concentration_pdf_example1 * np.random.uniform(
low=0.8, high=1.25, size=concentration_pdf_example1.shape
)
# Obtain initial guess
mode_guess1, gsd_guess1, particles_guess1 = (
generate_and_train_2mode_sizer.lognormal_2mode_ml_guess(
logspace_x=x_values_example1,
concentration_pdf=concentration_pdf_example1,
)
)
# Optimize
mode_opt1, gsd_opt1, particles_opt1, r2_opt1, results1 = (
generate_and_train_2mode_sizer.optimize_lognormal_2mode(
mode_guess=mode_guess1,
geometric_standard_deviation_guess=gsd_guess1,
number_of_particles_in_mode_guess=particles_guess1,
x_values=x_values_example1,
concentration_pdf=concentration_pdf_example1,
)
)
# Display and plot
print("Example 1:")
print(f"Initial Guess:")
print(f"Mode: {mode_guess1}")
print(f"GSD: {gsd_guess1}")
print(f"Number of particles: {particles_guess1}")
print(f"Optimized mode values: {mode_opt1}")
print(f"Optimized GSD: {gsd_opt1}")
print(f"Optimized number of particles: {particles_opt1}")
print(f"Optimized R²: {r2_opt1}")
print(f"Best optimization method: {results1['method']}")
plt.figure(figsize=(8, 6))
plt.plot(x_values_example1, concentration_pdf_example1, label="Original Data")
plt.plot(
x_values_example1,
lognormal_pdf_distribution(
x_values=x_values_example1,
mode=mode_opt1,
geometric_standard_deviation=gsd_opt1,
number_of_particles=particles_opt1,
),
label="Optimized Fit",
)
plt.xscale("log")
plt.xlabel("Particle Size")
plt.ylabel("Concentration PDF")
plt.legend()
plt.show()
Example 1: Initial Guess: Mode: [149.48451487 739.05747291] GSD: [1.18937477 1.07885284] Number of particles: [350.00811131 776.51103601] Optimized mode values: [ 99.38098192 795.02739947] Optimized GSD: [1.10427722 1.09958701] Optimized number of particles: [284.06490163 819.18581964] Optimized R²: 0.9779916485645566 Best optimization method: Powell
Conclusion¶
This notebook demonstrated the use of a neural network for generating an initial guess for fitting a lognormal PDF with two modes. The initial guess was then optimized to improve the fit, and the results were visualized to compare the original data, the initial guess, and the optimized fit.