Monday, February 23, 2026

Polynomial Regression With Example

In this post we'll see how to use polynomial regression. With simple linear regression or multiple linear regression, a straight-line (linear) relationship between predictors and target is assumed but that may not be the case always with the real-world data. Also, if the scatterplot of the residuals (y_test - y_pred) versus the predicted values (y_pred) shows a curvature or patterns, it suggests that the relationship between predictors and the response is non-linear.

In such cases, a simple linear regression is inadequate, and a more flexible model like polynomial regression can often improve the fit.

Polynomial Regression

Polynomial regression is a kind of linear regression that allows you to model non-linear relationship between the independent variables (X) and the dependent variable (y) by using the polynomial terms of the independent variable(s).

Polynomial regression model for a single predictor, X, is:

$$ y=\beta _0+\beta _1x+\beta _2x^2+\beta _3x^3+\dots +\beta _nx^n+\epsilon$$

where n is called the degree of the polynomial, so above equation is a n-th degree polynomial. Such a relationship is called quadratic if degree is 2, cubic if degree is 3 and so on. Here

  • y is the dependent variable.
  • x is the independent variable.
  • \( \beta _0, \beta _1, \dots , \beta _n \) are the coefficients of the polynomial terms.
  • \(\epsilon\) is the error term.

If there are multiple predictors (like x1,x2), polynomial regression also includes-

  • Powers of each feature (x12,x22 )
  • interaction terms (for example, x1.x2)

Suppose the predictors are x1,x2,x3. A polynomial regression of degree 2 (quadratic) can be written as:

$$ y=\beta _0+\beta _1x_1+\beta _2x_2+\beta _3x_3+\beta _{11}x_1^2+\beta _{22}x_2^2+\beta _{33}x_3^2+ \\ \beta _{12}x_1x_2+\beta _{13}x_1x_3+\beta _{23}x_2x_3+\epsilon$$

  • \(\beta _0\): intercept
  • \(\beta _i\): linear coefficients
  • \(\beta _{ii}\): quadratic terms (squares of predictors)
  • \(\beta _{ij}\): interaction terms (cross-products between predictors)
  • \(\epsilon\) : error term

Generalized form of polynomial regression is as given below-

For a polynomial of degree d with three predictors:

$$y=\sum _{i+j+k\leq d}\beta _{ijk}\, x_1^i\, x_2^j\, x_3^k+\epsilon $$

One thing to keep in mind about polynomial regression is that, though the features are non-linear transformations of inputs, polynomial regression is still considered linear regression since it is linear in the regression coefficients \(\beta _1, \beta _2, \beta _3 … \beta _n\).

Polynomial linear regression using scikit-learn Python library

Dataset used here can be downloaded from- https://www.kaggle.com/datasets/rukenmissonnier/manufacturing-data-for-polynomial-regression/data

Goal is to predict the quality rating based on the given features.

In the implementation code is broken into several smaller units with some explanation in between for data pre-processing steps.

1. Importing libraries and reading CSV file

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
df = pd.read_csv('./manufacturing.csv')
manufacturing.csv file is in the current directory.

2. Getting info about the data.

print(df.info())

Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3957 entries, 0 to 3956
Data columns (total 6 columns):
 #   Column                          		Non-Null Count  Dtype  
---  ------                         	 	--------------  -----  
 0   Temperature (°C)               		3957 non-null   float64
 1   Pressure (kPa)                  		3957 non-null   float64
 2   Temperature x Pressure   				3957 non-null   float64
 3   Material Fusion Metric     			3957 non-null   float64
 4   Material Transformation Metric  		3957 non-null   float64
 5   Quality Rating                  		3957 non-null   float64

You can also use the following command to get summary statisctics like mean, standard deviation, min and max values for each columns.

print(df.describe())

3. Removing columns

You can check for duplicate rows in order to remove them if required.

#checking for duplicates
print(df.duplicated().sum()) #0

4. Another check is for missing values

#count the number of missing (null, or NaN) values in each column of a DataFrame
print(df.isnull().sum())

Output

Temperature (°C)                  		0
Pressure (kPa)                    		0
Temperature x Pressure            		0
Material Fusion Metric           	 	0
Material Transformation Metric    		0
Quality Rating                    		0

So, there are no missing values.

5. Checking for multicollinearity

You can also check for multicollinearity by displaying a correlation heatmap which displays the relationships between variables.

  • Values close to 1 or -1 indicate strong correlations
  • Values close to 0 indicate weak or no correlations
# check for multicollinearity
correlation_matrix = df.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

If you want to remove any columns because of high multicollinearity following code can be used, in this example code no column has been removed.

# select columns with numerical values
v = X.select_dtypes(include ='number')
corr_matrix = v.corr().abs()   # absolute correlations
#corr_matrix
#print(corr_matrix)
upper = corr_matrix.where(
    #upper triangular part of an array
    np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
)
# get the columns having any corr value > .85
to_drop = [column for column in upper.columns if any(upper[column] > 0.85)]
print(to_drop)
X_reduced = X.drop(columns=to_drop)

6. Feature and label selection

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

Explanation-

X = df.iloc[:, :-1] in this

  • : means "select all rows."
  • :-1 means "select all columns except the last one."

y = df.iloc[:, -1]

  • : means select all rows.
  • -1 means select the last column, uses negative indexing

7. Plotting predictor-target relationship using scatter plot to show that it is not linear

#plot predictor-target relationship using scatter plot
features = X.columns
fig, axes = plt.subplots(1, len(features), sharey=True, figsize=(15, 4))
for i, col in enumerate(features):
    #plt.scatter(df[col], df["Quality Rating"])
    sns.scatterplot(x=df[col], y=df["Quality Rating"], ax=axes[i])

    axes[i].set_xlabel(col)
    axes[i].set_title(f"{col} \nvs Quality Rating")
plt.show()
Polynomial Regression

8. Splitting and scaling data

Splitting is done using train_test_split where test_size is passed as 0.2, meaning 20% of the data is used as test data whereas 80% of the data is used to train the model.

As seen in polynomial regression equation, it creates higher-degree terms (squared, cubic, etc.) from your variables. These variables will increase exponentially in value, which can skew the results. That is why normalizing your features is important, otherwise features with larger numeric ranges can dominate the model.

Note that both fitting and transformation (using fit_transform) is done for training data, whereas only transform() method is used for test data. That's how it should be done.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
#Polynomial Regression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state = 0)

#scaling values
scaler_X = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)

X_test_scaled = scaler_X.transform(X_test)

9. Polynomial Feature

Next thing is to get the degree of the polynomial. With the help of PolynomialFeatures class in scikit-learn library it becomes very easy to transform your existing features into higher-degree terms.

poly_reg = PolynomialFeatures(degree=2, include_bias=False)

x_poly = poly_reg.fit_transform(X_train_scaled)

The parameter include_bias controls whether a bias (intercept) column of ones is added to the transformed feature matrix. When you use PolynomialFeatures together with LinearRegression by default, LinearRegression(fit_intercept=True) already adds an intercept term to the model. So, if you also set include_bias=True in PolynomialFeatures, you'll end up with a redundant constant column of ones in your design matrix.

10. Fitting the model

lin_reg = LinearRegression()
lin_reg.fit(x_poly, y_train)

You may think why LinearRegression is used here. Keep in mind that it is applied to the polynomial features (x_poly).

Once the model is trained, predictions can be made using test data which can then be compared with the actual test data (y_test)

# predicting values
y_pred = lin_reg.predict(poly_reg.transform(X_test_scaled))

11. Comparing test and predicted data

# getting the residual percentage
df_results = pd.DataFrame({'Target':y_test, 'Predictions':y_pred})
df_results['Residual'] = df_results['Target'] - df_results['Predictions']
df_results['Difference%'] = np.abs((df_results['Residual'] * 100)/df_results['Target'])
print(df_results.head(10))

Output

      Target  Predictions  Residual  Difference%
3256  100.00       102.00     -2.00         2.00
142   100.00        99.54      0.46         0.46
2623   99.58       103.70     -4.12         4.14
3741  100.00       100.79     -0.79         0.79
2858   99.58       103.68     -4.10         4.11
3137   95.87        93.66      2.22         2.31
2672  100.00        99.01      0.99         0.99
1420  100.00        99.08      0.92         0.92
1669  100.00        98.94      1.06         1.06
1606  100.00        99.25      0.75         0.75

12. Seeing the model metrics such as R squared, mean squared error and root mean squared error.

#Metrics - R-Squared, MSE, RMSE
print("R2 score", r2_score(y_test, y_pred)) 
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error", mse)
print("Root Mean Squared Error", np.sqrt(mse))

13. Plotting residuals Vs predicted values

# Residuals = actual - predicted
residuals = y_test - y_pred

# Scatterplot: residuals vs fitted
sns.scatterplot(x=y_pred, y=residuals)
plt.axhline(0, color='red', linestyle='--')  # reference line at 0
plt.xlabel("Predicted Values (y_pred)")
plt.ylabel("Residuals (y_test - y_pred)")
plt.title("Residuals vs Fitted")
plt.show()
Polynomial Regression Python example

In polynomial regression, the plot of residuals vs. predicted values should look like a random, evenly scattered points around the horizontal zero line (y=0). As you can see, above plot doesn't show a very evenly scattered points and residuals are forming a curved pattern too.

If degree is increased to 5 in polynomial features

poly_reg = PolynomialFeatures(degree=5, include_bias=False)

and modelling is done then the plot of residuals vs. fitted values looks as given below.

The curve is less pronounced than in the quadratic case. However, there's still some systematic pattern, points are not fully randomly scattered.

That's all for this topic Polynomial Regression With Example. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. R-squared - Coefficient of Determination
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Wednesday, February 18, 2026

Mean Squared Error (MSE) With Python Examples

Mean Squared Error (MSE) is one of the most widely used metrics for evaluating the performance of regression models. It evaluates the prediction accuracy by measuring the average squared difference between predicted and actual values.

Mean Squared Error (MSE) equation

The formula for the mean squared error is-

$$ MSE=\frac{1}{n}\sum _{i=1}^n(y_i-\hat {y}_i)^2 $$

Here n is the total number of observations

yi is the actual value

\(\hat {y}_i \) is the predicted value

The MSE measures the average of the squared differences between predicted values and actual target values, which leads to the following characteristics-

  1. It’s in squared units of the target variable, so it’s not directly intuitive.
  2. MSE Penalizes Large Errors. If your dataset has extreme values, MSE will reflect them strongly. Which means it is sensitive to outliers. For example, an error of 10 when squared- 102=100. At the same time an error of 100 when squared- 1002=10,000
  3. Smaller the MSE, better the model’s prediction accuracy.
  4. Values of MSE may be used for comparing two or more statistical models, in case multiple models are considered for the dataset. Model with lowest MSE is deemed better

MSE calculation example using Numpy in Python

import numpy as np
y_test = [6295, 10698, 13860, 13499, 15750]
y_pred = [5691, 12380, 18371, 15935, 22500]

mse = np.mean(np.square(np.subtract(y_test, y_pred)))
print(mse)

Output

15007931.4

MSE calculation example using sklearn in Python

from sklearn.metrics import mean_squared_error
y_test = [6295, 10698, 13860, 13499, 15750]
y_pred = [5691, 12380, 18371, 15935, 22500]
mse = mean_squared_error(y_test, y_pred)
print(mse)

Output

15007931.4

That's all for this topic Mean Squared Error (MSE) With Python Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Encapsulation in Python
  2. R-squared - Coefficient of Determination
  3. Mean, Median and Mode With Python Examples
  4. Simple Linear Regression With Example
  5. Multiple Linear Regression With Example

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Monday, February 9, 2026

Multiple Linear Regression With Example

In the post Simple Linear Regression With Example we saw how to create a Simple linear regression model using the scikit-learn library in Python. In this post we'll see how to create a multiple linear regression model using the scikit-learn library. We'll also go through the steps for data pre-processing and cleaning, feature transformation, encoding categorical data.

Multiple Linear Regression

In simple linear regression we model relationship between one independent variable (predictor) and one dependent variable. Multiple Linear Regression is a fundamental statistical technique used to model the relationship between one dependent variable and multiple independent variables. So, we'll create a model to analyse how multiple features affect the outcome.

Multiple Linear Regression equation

In context of machine learning where we have sample data and we use it to create regression model, multiple linear regression equation is as given below.

$$ \hat {y}=b_0 + b_1x_1 + b_2x_2 + b_3x_3 + ….. + b_nx_n $$

Here \(\hat{y}\) is the predicted label - Output

b0 is the intercept, which tells you where the regression line intercepts the Y-axis. Or you can say it is the value when all the predictors x1, x2, x3, .. , xn are zero.

b1, b2, bn are the slopes. It tells how much dependent variable changes for one unit change in given independent variable when all the other independent variables are held constant. For example, b1 represents the estimated change in \(\hat {y}\), against per unit increase in x1 when x2, x3, .. , xn are held constant. To explain it in other words, if you want to interpret b1, you imagine increasing x1 by 1 unit while keeping x2, x3, .. , xn unchanged. Then the predicted change in \(\hat {y}\) is exactly b1. Same logic applies for b2, b3 and so on.

The residual (difference between actual value and predicted value) term is calculated as \(e_i = y_i - \hat{y}_i\).

In the model these slopes (b1, b2, …) are chosen to minimize the mean of this residual sum of squares (Mean Squared Error).

$$ L=\frac{1}{n}\sum _{i=1}^n(y_i-\hat {y}_i)^2 $$

The goal of the model is to find the best fit line which has the minimum Mean Squared Error.

Multiple linear regression using scikit-learn Python library

Dataset used here can be downloaded from- https://www.kaggle.com/datasets/hellbuoy/car-price-prediction

Goal is to predict the car price based on the given features.

In the implementation code is broken into several smaller units with some explanation in between for data pre-processing steps.

1. Importing libraries and reading CSV file

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('./CarPrice_Assignment.csv')

2. Getting info about the data. The parameter include='all' includes all the columns otherwise only columns with numerical values are included.

df.describe(include='all')

3. Removing columns

There are 26 columns and 205 rows in the dataset. On analyzing the data, you can make an observation that "car_ID" is inconsequential, "CarName" column has 147 unique values, encoding these many unique categorical values will be a problem so we'll drop these 2 columns. Also, "enginelocation" column has 'front' as value for 202 out of 205 rows, making it as good as a constant, so we can drop this column too.

# dropping columns car_id (just a unique id)
# CarName as there are 147 unique car names, encoding it will add lot of columns
df = df.drop(columns=['car_ID', 'CarName'])

# dropping columns enginelocation (as 202 entries are with value 'front')
df = df.drop("enginelocation", axis=1)

4. Removing outliers

We can also check for outliers in the dependent and independent variables because extreme values can disproportionately affect the regression line.

For that we can plot a distribution, if we do it for 'price'

g = sns.displot(df['price'], kde=True, bins=30)
g.set(xlim=(0, None))
normal distribution

As you can see, there is a positive skew. Let's say we want to avoid top 1% of the prices (treat them as outliers), that can be done using quantile() function in Pandas to return 99th percentile value.

#taking only the 99% records
qnt = df['price'].quantile(0.99)
data_re = df[df['price'] < qnt]

After that if you run the data_re.describe(include='all') code line, you can see that 3 rows are removed and the max price is now 37028. That way we have decreased some of the skewness in the price data.

Same way you can check for some of the independent variables and remove some outliers if needed.

5. Resetting index

If you have removed few records based on quantile values, you can use reset_index() to rearrange the index of a DataFrame back to the default integer index (0, 1, 2, …). As removing random records disturbs the default index.

#resetting indices
data_processed = data_re.reset_index(drop=True)

6. Checking for linear relationship between variables.

You can also plot the independent variables Vs price and verify the scatter plot, if scatterplot looks roughly like a straight line that means a likely linear relationship. In case relationship doesn't look linear we may have to use logarithmic transformation, square root transformation to transform the data.

f, (p1,p2,p3) = plt.subplots(1,3, sharey=True, figsize=(15,3))
p1.scatter(data_processed['enginesize'], data_processed['price'])
p1.set_title('EngineSize and Price')
p2.scatter(data_processed['horsepower'], data_processed['price'])
p2.set_title('HorsePower and Price')
p3.scatter(data_processed['highwaympg'], data_processed['price'])
p3.set_title('HighwayMpg and Price')
ScatterPlot

At least for these variables relationship looks linear.

7. Feature and label selection

y = data_processed['price']   # dependent variable
X = data_processed.drop(columns=['price'])   # independent variables

8. Checking for multicollinearity

Multicollinearity in linear regression occurs when two or more independent variables are highly correlated, meaning they provide redundant information, making it difficult to quantify the individual contribution of each independent variable to the dependent variable.

For detecting multicollinearity two of the most used options are-

Correlation Matrix

A correlation matrix is a matrix displaying correlation coefficients for all the possible pairs of predictors. That helps to find relationships between independent variables. Look for high correlation coefficients (e.g., >0.7 or 0.8) between predictors.

Variance Inflation Factor (VIF)

The Variance Inflation Factor (VIF) measures how much the variance of an estimated regression coefficient is increased due to collinearity (correlation) among predictor variables. A high VIF of greater than 10 indicates multicollinearity.

If you want to use correlation matrix to find the high correlation (here it is kept as 0.85) then following code will drop the features.

# select columns with numerical values
v = X.select_dtypes(include ='number')
corr_matrix = v.corr().abs()   # absolute correlations
#corr_matrix
#print(corr_matrix)
upper = corr_matrix.where(
    #upper triangular part of an array
    np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
)
# get the columns having any corr value > .85
to_drop = [column for column in upper.columns if any(upper[column] > 0.85)]
print(to_drop)
X_reduced = X.drop(columns=to_drop)

If you want to use VIF (upper limit kept as 10) then following code will drop the features. Note that the statsmodels library provides the variance_inflation_factor function to compute VIF for each variable in a regression model.

from statsmodels.stats.outliers_influence import variance_inflation_factor
# select columns with numerical values
v = X.select_dtypes(include ='number')
#select catgorical features
categorical_features = X.select_dtypes(exclude='number').columns
# create a new dataframe
vif = pd.DataFrame()
vif["VIF"] = [variance_inflation_factor(v.values, i) for i in range(v.shape[1])]
vif["Features"] = v.columns
#get the columns where VIF is less than or equal to 10
valid_numeric = vif.loc[vif["VIF"] <= 10, "Features"]
final_features = list(valid_numeric) + list(categorical_features)
X_reduced = X[final_features]

I have used correlation matrix code in this example which drops the following columns.

['carlength', 'curbweight', 'enginesize', 'highwaympg']

9. Splitting and encoding data

Splitting is done using train_test_split where test_size is passed as 0.2, meaning 20% of the data is used as test data whereas 80% of the data is used to train the model. OneHotEncoder is used to encode categorical data. With OneHotEncoder, drop = 'first' parameter is used so that new columns are created for only n-1 unique values, that helps in avoiding dummy variable trap. The parameter handle_unknown = 'ignore' helps with training vs. transform mismatch. The encoder learns categories from the training set. If new categories appear in the test set, they're "unknown". Parameter handle_unknown='ignore' ensures unseen categories don't break the transform step. They'll be encoded as all zeros.

Note that both fitting and transformation (using fit_transform) is done for training data, where as only transform() method is used for test data. That's how it should be done. When you split data into test data and train data there is a chance some value may not appear in train data and only in test data. Your encoder has learned about the categories from training data, when a new value is encountered while transforming test data it is an unknown value for encoder, that's where handle_unknown='ignore' parameter helps.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
from sklearn.metrics import r2_score, mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.2, random_state=0)

ct = ColumnTransformer([
    ('encoder', OneHotEncoder(sparse_output = False, drop = 'first',handle_unknown = 'ignore'), X_reduced.select_dtypes(exclude='number').columns)
],remainder = 'passthrough')
 
ct.fit_transform(X_train)
X_train_enc = ct.fit_transform(X_train)
X_test_enc = ct.transform(X_test)

10. Training the model

From sklearn you import the LinearRegression class. Later you have to create an object of this class and call the fit method to train the model, parameters passed to the fit method are training data (X_train in our case) and target values (y_train in our case).

reg = LinearRegression()
reg.fit(X_train_enc, y_train)

11. Once the model is trained, predictions can be made using test data which can then be compared with the actual test data (y_test)

y_pred = reg.predict(X_test_enc)

12. Comparing test and predicted data

df_results = pd.DataFrame({'Target':y_test, 'Predictions':y_pred})
df_results['Residual'] = df_results['Target'] - df_results['Predictions']
df_results['Difference%'] = np.abs((df_results['Residual'] * 100)/df_results['Target'])
print(df_results.head())
	Target   Predictions     Residual  Difference%
18    6295.0   5691.859375   603.140625     9.581265
171  10698.0  12380.265625 -1682.265625    15.725048
107  13860.0  18371.781250 -4511.781250    32.552534
98   13499.0  15935.093750 -2436.093750    18.046476
178  15750.0  22500.046875 -6750.046875    42.857440
print(df_results.describe())
	 Target   	Predictions     Residual  Difference%
count     41.000000     41.000000    41.000000    41.000000
mean   13564.524390  13802.394436  -237.870046    15.610925
std     7463.439157   6884.600274  2839.126221    11.662503
min     6189.000000   5691.859375 -7951.265625     0.230753
25%     8495.000000   9159.765625 -1665.515625     7.594619
50%    11694.000000  12293.171875  -448.171875    13.121822
75%    15750.000000  16523.468750  1010.000000    21.949032
max    37028.000000  33051.968750  5494.531250    48.020689

As you can see min difference percentage is 0.23 while the max is 48.02.

13. Seeing the model metrics such as R-squared, mean squared error and root mean squared error.

print("R2 score", r2_score(y_test, y_pred)) 
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error", mse)
print("Root Mean Squared Error", np.sqrt(mse))

That's all for this topic Multiple Linear Regression With Example. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. R-squared - Coefficient of Determination
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

R-squared - Coefficient of Determination

This post tries to explain one of the metrics used for regression models which is R2, known as R-squared or coefficient of determination.

What is R-squared

R-squared is a statistical measure, measuring the goodness of fit in the regression models. It measures how well the independent variables explain the variability in the dependent variable.

Value of the R-squared lies between 0 and 1

  1. Value of 0 means the model doesn't explain the variability at all.
  2. Value of 1 means the model explains all of the variability (perfect fit).

Though a value of 1 would most surely suggests an overfitting.

Then the question is; what is a good R2 value?

Well, that depends a lot on context. In fields like physics or engineering, if you are creating a mathematical model or regression equation that fits experimental or simulation data, values above 0.9 are often expected, while in social sciences or economics, values around 0.3-0.5 can still be considered meaningful. There's no universal cutoff.

For example, If R2=0.85, then 85% of the variability in y is explained by the model, and 15% remains unexplained.

Another question is what does variability of data mean? If we take the simple regression model equation which is-

\[ \hat{y} = b_{0} + b_{1}X_{1} \]

Then R2 is the metrics that tells us how well the whole simple regression model (the combination of x values and coefficients) explains the variability in y.

Imagine predicting salaries when-

  • x = experience in years
  • b1 = salary increase with each year

Then R2=0.80 means: Using "experience in years" as input, the regression line explains 80% of why salaries differ with years of experience. Where as 20% remain unexplained.

Equation for R-squared

The formula for calculating R-squared is

R2 = 1 - RSS/TSS

Where RSS is the residual sum of squares, also called sum of squares of error (SSE). Here, residual is defined as, if actual value is yi and the predicted value is \(\hat{y_i}\) then the residual = \(y_{i} - \hat{y_i}\)

$$ SSE=\sum_{i=1}^{n} (y_i-\hat {y}_i)^2 $$

RSS measures the unexplained variability in the data.

TSS is the total sum of squares, which refers to how spread out the values of the dependent variable (y) are around their mean.

$$ TSS=\sum_{i=1}^{n} (y_{i}-\bar{y})^2 $$

TSS explains the total variability in the data.

R-Squared

In the above image, there is a line drawn for mean and the differences between the observed values and mean is always going to be much greater than the residuals. That is why R-squared should have a value between 0 and 1.

R-squared example

If we take the same salary dataset used in the simple linear regression example, then the regression equation comes out to-

y_hat = 24848 + 9450*x

and the mean of y values is- 76004

TSS = 21794977852

RSS = 938128552

After computing sum of squares R2 = 1 - (938128552/21794977852) = 0.9570

That's all for this topic R-squared - Coefficient of Determination. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. Multiple Inheritance in Python
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Friday, February 6, 2026

Simple Linear Regression With Example

Regression analysis is one of the most used ways for predictions. With in the regression analysis, linear regression is considered the starting point of the machine learning.

A linear regression is a statistical technique used to find the relationship between variables. There is a dependent variable (target) and one or more independent variables (predictors). In terms of machine learning you want to model the relationship between features and a label. Linear regression assumes this relationship is linear, meaning it can be represented by a straight line.

Simple Linear Regression

Simple linear regression is a linear regression which finds the causal relationship between two variables.

  1. One variable, generally denoted as x is the independent variable or predictor.
  2. Another variable, generally denoted by y is the dependent variable or target.

For example, if we want to find the relationship between years of experience and salary then years of experience is the independent variable and salary is the dependent variable and we want to find the causal relationship between years of experience and salary with the understanding that with increasing experience salary also increases.

If you have years of experience and salary data, you can use it to fit a simple linear regression model. Once that "learning" is done you can predict the salary by passing the years of experience.

Simple Linear Regression equation

In context of machine learning where we have sample data and we use it to create regression model, simple linear regression equation is as given below.

$$ \hat{y} = b_{0} + b_{1}X_{1} $$

Here \(\hat{y}\) is the predicted label - Output

b0 is the intercept, which tells you where the regression line intercepts the Y-axis. Or you can say it is the value when independent variable (x) is 0.

b1 is the slope. It tells how much dependent variable changes for one unit change in independent variable.

Simple Linear Regression

Ordinary Least Squares (OLS) estimation

In the above image, the regression line is labelled as the best fit line. But how do we know that this line is the best fit line. There are many straight lines that can be drawn going through the x values and intercepting the y axis. One way to find the best- fit line is by using the ordinary least squares estimation.

Ordinary least squares work by minimizing the sum of the squared differences between the observed values (the actual data points) and the values predicted by the model (lying on the regression line).

If actual value is yi and the predicted value is \(\hat{y_i}\) then the residual = \(y_{i} - \hat{y_i}\).

Squaring these differences ensures that both positive and negative residuals are treated equally. So, the best-fit line is the line for which the sum of the squared of the residuals (RSS) is minimum.

$$ RSS = \sum_{i=1}^{n} (y_{i}-\hat{y}{i})^2 $$

OLS

Formula for slope and intercept

The formula for calculating slope is-

\(b_{1}=r*\frac{s_{y}}{s_{x}}\)

Where: \(r\) = Pearson's correlation coefficient between \(x\) and \(y\).

\(s_{y}\) = Standard deviation of the \(y\) variable.

\(s_{x}\) = Standard deviation of the \(x\) variable.

Formula for intercept is

\(b_0=\bar{y} - b_1\*\bar{x}\) meaning (Mean of \(y\) - Slope \(\times \) Mean of \(x\))

After replacing the value of b1

\( b_0=\bar{y} - r\frac{s_{y}}{s_{x}} \times \bar{x}\)

Simple linear regression by manually calculating slope and intercept

Though scikit-learn library implements ordinary least squares (OLS) linear regression and that is the way to model simple linear regression in ML using Python but let's try to do it manually by using the above mentioned formulas first. This code still uses other Python libraries like Pandas, Numpy and Matplotlib.

Salary dataset used here can be downloaded from this URL- https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression

1. Importing libraries and reading CSV file
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# reading dataset from current directory
df = pd.read_csv("./Salary_dataset.csv")
print(df.head())

On printing using head() function first five rows are displayed

Unnamed: 0  YearsExperience   Salary
          0              1.2  39344.0
          1              1.4  46206.0
          2              1.6  37732.0
          3              2.1  43526.0
          4              2.3  39892.0

As you can see there is serial number also for which column name is "Unnamed: 0 ". This column name is not needed so let's drop it.

#Removing serial no column
df = df.drop("Unnamed: 0", axis=1)

2. Calculating the values for equation.

#Calculate mean and standard deviation
mean_year = df['YearsExperience'].mean()
mean_salary = df['Salary'].mean()
std_year = df['YearsExperience'].std()
std_salary = df['Salary'].std()
# correlation coefficient between Years of experience and salary
corr = df['YearsExperience'].corr(df['Salary'])

print(corr) # 0.9782416184887599
# calculate slope
slope = corr * std_salary/std_year
print(slope)
# calculate intercept
intercept = mean_salary - (slope * mean_year)
print(intercept)

3. Predicting values

# get predicted salaries
y_pred = intercept + slope * df['YearsExperience']

# concatenate two panda series for actual salaries and predicted salaries
combined_array = np.column_stack((df['Salary'].round(2), y_pred.round(2)))
# check displayed values 
print(combined_array)

#Predict salary for given years of experience
sal_pred = intercept + slope * 11
print(sal_pred) # 128797.78950252903

4. Plotting the regression line

#Plot regression line
# Scatter plot for actual values
plt.scatter(df['YearsExperience'], df['Salary'], color='blue', label='Actual')
# Plot the regression line
plt.plot(df['YearsExperience'], y_pred, color='red', label='Regression Line')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.title('Years of experience Vs Salary')
plt.legend()
plt.show()

Simple linear regression using scikit-learn Python library

The above example shows how to calculate slope and intercept manually for linear regression but scikit-learn provides in-built support for creating linear regression model. Let's go through the steps.

1. Importing libraries and data pre-processing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# read CSV file
df = pd.read_csv("./Salary_dataset.csv")
# remove serial no. column
df = df.drop("Unnamed: 0", axis=1)

2. As a second step we do feature selection and splitting the data into two sets; training data and test data. Sklearn has inbuilt support for splitting.

# Feature and label selection
X = df['YearsExperience']
y = df['Salary']

As a convention in the ML code, capital X is used for the input data because it represents a matrix of features, while lowercase y is used for the target because it is typically a vector. Splitting is done using train_test_split where test_size is passed as 0.2, meaning 20% of the data is used as test data whereas 80% of the data is used to train the model.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score 
# splitting data into test data (80%) and train data (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Training the model

From sklearn you import the LinearRegression class which is an implementation of Ordinary least squares Linear Regression. Later you have to create an object of this class and call the fit method to train the model, parameters passed to the fit method are training data (X_train in our case) and target values (y_train in our case)

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score 
# splitting data into test data (80%) and train data (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#scikit-learn models require a 2D array input for features (X), even for a single feature
#so reshape(-1, 1) is used to convert to 2D array
X_train_reshaped = X_train.values.reshape(-1, 1)
reg = LinearRegression()

# Train the model on the training data
reg.fit(X_train_reshaped, y_train)

# Print intercept and coefficient
print('Intercept (b0) is', reg.intercept_)
print('Weight (b1) is', reg.coef_[0])

Which gives the following output for intercept and coefficient.

Intercept (b0) is 24380.20147947369
Weight (b1) is 9423.81532303098

4. Once the model is trained, predictions can be made using test data which can then be compared with the actual test data (y_test)

# predict values for the test data
y_pred= reg.predict(X_test.values.reshape(-1,1))

combined_data = pd.DataFrame({'Actual Salaries':y_test, 'Predicted Salaries':y_pred})
print(combined_data)

combined_data gives both actual values and predicted values side by side.

    Actual Salaries  Predicted Salaries
27         112636.0       115791.210113
15          67939.0        71499.278095
23         113813.0       102597.868661
17          83089.0        75268.804224
8           64446.0        55478.792045
9           57190.0        60190.699707

5. You can also predict salary by passing year.

#Predict salary for given years of experience
sal_pred =  model.predict([[11]])
print(sal_pred) # 128042.17003281

6. Seeing the model metrics such as R-squared, mean squared error and root mean squared error.

print("R2 score", r2_score(y_test,y_pred)) 
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error", mse)
print("Root Mean Squared Error", np.sqrt(mse))

7. Plotting the regression line

# Scatter plot for actual values
plt.scatter(X_test, y_test, color='blue', label='Actual')
# Plot the regression line
plt.plot(X_test, y_pred, color='red', label='Regression Line')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.title('Years of experience Vs Salary')
plt.legend()
plt.show()
RegressionPlot

That's all for this topic Simple Linear Regression With Example. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. Multiple Inheritance in Python
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Mean, Median and Mode With Python Examples

This post explains Mean, Median and Mode which are measures of central tendency and help to summarize the data. Here measure of central tendency is a value which identifies the middle position with in a set of data.

Here we'll look at how to calculate mean, median and mode and which one is more appropriate in the given scenario.

Mean

Mean, which is the arithmetic average is calculated by summing all the values in the data set divided by the number of values in the data set. If there are n values ranging from \(x_1, x_2, \dots, x_n \) then the mean \( \overline{x} \) (x bar) is calculated as:

$$ \bar{x} = \frac{x_1 + x_2 + \cdots + x_n} {n}$$

Using summation notation same thing can be written as:

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

For example, if we have a data set of 10 values as given below-

5, 8, 12, 15, 18, 20, 22, 25, 30, 35

Then the sum of the values is-

5 + 8 + 12 + 15 + 18 + 20 + 22 + 25 + 30 + 35 = 190

And the mean is \( \overline{x} \) = 190/10 = 19

Mean is a better choice when data is normally distributed.

When mean is not a better choice

Mean may not be a best choice when data is skewed as mean is sensitive to outliers. In skewed data, outliers (very high or low values) can drag the mean away from the center.

For example, if values are- 10, 12, 13, 14, 15, 100

Then the mean is- 164/6 = 27.33

As you can see mean is pulled away from the center because of one extreme value 100. In such cases median is better option.

Median

Median is the middle value in an ordered (ascending or descending) set of data. Formula for median is as given below-

  1. If the dataset has an odd number of values, it is the middle value. $$\left(\frac{n+1}{2}\right)^\text{th}\text{value} $$
  2. If the dataset has an even number of values, it is the average of the two middle values. $$ \text{Median} = \frac{\left(\frac{n}{2}\right)^\text{th}\text{value} + \left(\left(\frac{n}{2}\right)+1\right)^\text{th}\text{value}}{2} $$

For example, in order to calculate median for

5, 15, 18, 20, 22, 35, 8, 12, 25, 30

First sort them in ascending order-

5, 8, 12, 15, 18, 20, 22, 25, 30, 35

Number of values is 10 (even) so the median is-

\(\frac{\left(\frac{10}{2}\right)\text{th value} + \left(\left(\frac{10}{2}\right)+1\right)\text{th value}}{2} = \frac{5^{\text{th}} \text{ value} + 6^{\text{th}} \text{ value}}{2} \) = (18+20)/2 = 19

So, the median of the dataset is 19.

Median responds well to the skewed data

Earlier we have seen that the mean is sensitive to the outliers whereas median doesn't vary.

For example, if values are- 10, 12, 13, 14, 15, 100

Then the median = (13 + 14)/2 = 13.5

Which is close to center.

Mode

The mode is the most frequent value in the dataset. For example, if we have the following list of values

2, 4, 4, 5, 7, 7, 7, 8, 9, 10

Then 7 is the mode as that has the highest frequency 3.

Mode is not sensitive to outliers.

We may have a scenario where all values appear exactly once meaning no mode. We may also have a scenario where 2 or more values have the same frequency meaning multiple modes.

Mode is the best measure of central tendency when you're dealing with categorical data (non-numerical), or when you want to identify the most common value in a dataset. For example, you want to find the most shopped brand or most preferred colour.

When you want the most typical value, for example most bought shoe size.

Shoe sizes: [7, 8, 8, 8, 9, 10]

Here mode = 8 (most common size)

Calculating mean, median, mode using Python libraries

1. NumPy library has mean and median functions to calculate mean and median. For mode SciPy library provides mode method.

import numpy as np
from scipy import stats
values = [2, 4, 4, 5, 7, 7, 7, 8, 9, 10]
#Mean and Median
mean = np.mean(values)
median = np.median(values)
print('Mean is', mean)
print('Median is', median)
#Mode = returns an array of mode and count
mode = stats.mode(values)
print('Mode is', mode[0], 'count is', mode[1])

Output

Mean is 6.3
Median is 7.0
Mode is 7 count is 3

2. Using Pandas library which has mean, median and mode functions. You can convert list of values to Pandas series and then calculate mean, median and mode.

import pandas as pd
values = [2, 4, 4, 4, 5, 7, 7, 7, 8, 9, 10]
data = pd.Series(values)
mean = data.mean()
median = data.median()
# returns a Series (which can have multiple modes)
mode = data.mode()
print(f"Mean is {mean:.2f}")
print('Median is', median)
print('Mode is', list(mode))

Output

Mean is 6.09
Median is 7.0
Mode is [4, 7]

That's all for this topic Mean, Median and Mode With Python Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. R-squared - Coefficient of Determination
  5. Simple Linear Regression With Example

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Monday, December 1, 2025

Coin Change - Min Number of Coins Needed - Java Program

In this article we'll see how to write a Java program for the coin change problem, which states that "Given an array of coins storing coins of different denominations and an integer sum representing an amount. Find the minimum number of coins needed to make that sum. If that amount of money cannot be made up by any combination of the coins, return -1."

Any coin can be used any number of times.

For example-

1. Input: int coins[] = {9, 6, 5, 1}, sum = 101
Output: 12
How: 9 (10 times) + 6 + 5

2. Input: int coins[] = {1, 2, 5}, sum = 11
Output: 3
How: 5 + 5 + 1

3. Input: int coins[] = {2}, sum = 0
Output: 0

The "coin change" problem is a good example of dynamic programming as you can break this problem into smaller overlapping sub-problems and you can also store the result of those subproblems (memoization) to avoid redundant calculations.

Coin change problem Java Program

The program for finding the minimum number of coins that add to the given sum can be written using

  1. Recursion without any optimization.
  2. You can add memoization with recursion to make it faster this is also called top-down approach in dynamic programming.
  3. You can also use bottom-up approach also known as tabular form. In bottom-up approach you try to solve the sub-problems first and use their solutions to arrive at solutions to bigger sub-problems.

We'll write java programs for all these three approaches here.

1. Using recursion

Here the approach is, if for the selected coin, sum - coin >= 0, i.e. coin can be used to get the sum then we have to recursively call the method with the rest of the amount (sum-coin). You also need to ensure that you take the minimum of current count and the result of adding selected coin to count.
count = Math.min(count,  res+1);

Here is the complete Java program with the recursive method.

public class MinCoinSum {
  public static void main(String[] args) {
    int coins[] = {9, 6, 5, 1};
    int sum = 10;
    
    int c = minCoinsNeeded(coins, sum);
    System.out.println("Coins needed- " c);			
  }  
  private static int minCoinsNeeded(int[] coins, int sum) {
    if(sum == 0) {
      return 0;
    }
    int count = Integer.MAX_VALUE;
    int res = 0;
    for(int coin : coins) {
      
      if(sum - coin >= 0) {
        res = minCoinsNeeded(coins, sum-coin);				
        if(res != -1) {
          // if coin needed; that is include 
          // scenario (res+1) otherwise you go with count
          count = Math.min(count,  res+1);	
        }
      }
    }
    return (count == Integer.MAX_VALUE) ? -1 : count;
  }
}

Output

Coins needed- 2

Time and space complexity with this approach

With this approach each recursive call tries all coin denominations. If amount is n and the number of coins is c then the time complexity is O(cn)

Recursion stack depth will be n so, the space complexity is O(n).

2. Using memoization with recursion

With the above recursive method there are many repetitive calls. Memoization improves this by caching results for subproblems.

If the target amount is sum then we may need answers for all amounts from 0 up to sum in this form-

dp[i]= minimum coins required to get amount i

So, we need a 1-D array having length equal to the sum+1. This array is initialized with -1 as initial value to indicate no value is stored yet.

public class MinCoinSum {

  public static void main(String[] args) {
    int coins[] = {9, 6, 5, 1};
    int sum = 101;

    int[] dp = new int[sum+1];
    Arrays.fill(dp, -1);

    int c = minCoinsNeeded(coins, sum, dp);
    System.out.println("Coins needed- " + c);  
  }
  
  private static int minCoinsNeeded(int[] coins, int sum, int[] dp) {
    if(sum == 0) {
      return 0;
    }
    // value already stored, so return that value
    if(dp[sum] != -1) {
      return dp[sum];
    }
    int count = Integer.MAX_VALUE;
    int res = 0;
    // Go through all the coins
    for (int coin : coins) {
      // if current coin can be used to get the final amount
      if(sum - coin >= 0) {
        // recursively find the minimum coins for the remaining amount
        res = minCoinsNeeded(coins, sum - coin, dp);
        
        if(res != -1) {
          count = Math.min(count,  res+1);
        }
      }
    }
    // Store result in array
    dp[sum] = (count == Integer.MAX_VALUE) ? -1 : count;
    return dp[sum];
  }
}

Output

Coins needed- 12

Time and space complexity with this approach

If amount is n and the number of coins is c then the space complexity is O(n X c). As there are n subproblems and c checks for each subproblem.

Space needed is O(n+1) for dp array and O(n) for recursion stack so the overall space complexity is O(n).

3. With tabulation (Bottom-up) approach

With tabulation form, to write logic for minimum number of coins to get the sum, iterative logic is used not recursive which in itself is an optimization. This is also called bottom-up approach as we build solutions for all amounts starting from 0 going up all the way to given sum.

A dp array of length equal to sum+1 is needed as we go through amount 0..sum.

All entries of dp should be initialized to a large number (at least greater than the sum). The value of dp[0] is equal to 0 (zero coins needed for amount 0). The logic for solution is as given below.

  1. Iterate in an outer loop for the coins
  2. In the inner loop, for each amount i, check if using this coin improves the minimum count.
  3. Compare the scenario, if this coin is added which means dp[i – coin]+1 with current dp[i] and take the minimum.
public class MinCoinSum {

  public static void main(String[] args) {
    int coins[] = {9, 6, 5, 1};
    int sum = 101;
    
    int c1 = minNoOfCoins(coins, sum);
    System.out.println("Coins needed- " + c1);  
  }
  
  private static int minNoOfCoins(int[] coins, int sum) {
    
    int[] dp = new int[sum+1];
    // Integer.MAX_VALUE causes problem when 1 is added to it
    //Arrays.fill(dp, Integer.MAX_VALUE);
    Arrays.fill(dp, sum+1);
    dp[0] = 0;
    for(int coin : coins) {
      for(int i = coin; i <= sum; i++) {                  
      dp[i] = Math.min(dp[i], dp[i-coin] + 1);
      }
    }
    return dp[sum] > sum ? -1 : dp[sum];
  }
}

Output

Output
Coins needed- 12

Time and space complexity with this approach

If amount is n and the number of coins is c then, Outer loop runs for each coin i.e. c times. Inner loop runs for each amount up to sum i.e. n times.

Thus, the time complexity is O(n X c).

DP array of size sum + 1 is requires so the space complexity is O(n).

That's all for this topic Coin Change - Min Number of Coines Needed - Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page


Related Topics

  1. Longest Increasing Subsequence Java Program
  2. Exponential Search Program in Java
  3. Greatest Common Divisor (GCD) of Two Numbers Java Program
  4. Detect Cycle in an Undirected Graph Using DFS - Java Program
  5. Fibonacci Series Program in Java

You may also like-

  1. Generating Getters And Setters Using Reflection in Java
  2. Producer-Consumer Java Program Using ArrayBlockingQueue
  3. How to Untar a File in Java
  4. How to Read File From The Last Line in Java
  5. How ArrayList Works Internally in Java
  6. throws Keyword in Java Exception Handling
  7. List in Python With Examples
  8. Spring Bean Life Cycle