Thursday, March 12, 2026

Interface in Java With Examples

In this tutorial, you’ll dive deep into the concept of Interface in Java, a powerful feature that enables developers to achieve complete abstraction in Java. An interface defines what a class must do, but leaves the details of how it should be done to the implementing class. This makes interfaces a cornerstone of clean, modular, and flexible Java programming


How does Interface differ from a class

Although an Interface in Java looks syntactically similar to a class, it differs in several important ways:

  • Interfaces cannot have instance variables, meaning they don’t maintain state.
  • Methods inside an interface are declared without a body and end with a semicolon.
  • An interface cannot be instantiated, so constructors are not allowed.
  • A class implements an interface rather than extending it.
  • An interface itself can extend multiple interfaces, supporting multiple inheritance in Java.

Please note that, Java 8 onward, it is possible to add a default implementation to a method in an interface as well as interface static methods and even Private Methods in Java Interface Java 9 onward, here we'll discuss interfaces in its normal form.

General form of interface in Java

access_modifier interface name {
 type final_var1 = value;
 type final_var2 = value;
 ----
 ----
 return_type method_name1(param_list);
 return_type method_name2(param_list);
 ----
 ----
}

When no access modifier is specified, an Interface in Java has default access, meaning it is visible only to other members within the same package. However, when an interface is declared as public, it becomes universally accessible and can be used by any program across different packages.

For top level interfaces only these two access modifiers (default and public) are permitted, there are nested interfaces which can be declared as public, private or protected. We'll discuss nested interfaces later in the post.

All variables declared inside an Interface in Java are implicitly public, static, and final. This means they act as constants: they must be initialized at the time of declaration and cannot be modified by the implementing class. Such behavior ensures consistency and prevents accidental changes across different implementations.

Similarly, all methods within an Interface in Java are implicitly abstract and public. These methods define the contract that any implementing class must fulfill, but they do not provide the actual implementation. By enforcing this rule, interfaces promote clean design, flexibility, and a clear separation of responsibilities in Java applications

Note that Java 9 onward you can also add private methods to a Java interface. Please refer Private Methods in Java Interface to know more about adding private methods in a Java interface.

Java Interface example

public interface MyInterface {
 int i = 10;
 Number method1();
 void method2(String Id);
}

Here is an interface MyInterface which has one variable i, which is public static final implicitly. There are 2 methods, one has no return type, one returns Number.

Implementing an Interface in Java

In Java, class uses implements keyword to implement an interface.

public class MyClass implements MyInterface { 
 public Integer method1() {
  System.out.println("in method 1" + i);  
  return null;
 }

 public void method2(String Id) {
  System.out.println("in method 2");  
 }
 public static void main(String[] args) {
  
 }
}

When implementing methods defined in interfaces there are several important points-

  • A class can implement more than one interface, in that case interfaces are separated with a comma.
     class class_name implements interface 1, interface 2{
     }
     
  • The methods of an interface that are implemented by a class must be declared public i.e. visibility can't be reduced.
  • The signature of the interface method must remain same while implementing it. In case of return type subclass of the return type as defined in interface can also be used. As in the above program method1 has the return type Number where as in the implemented method in the class has the return type Integer. It is permissible as Integer is the sub-class of Number.
  • The initialized variable in the interface is constant and it is not allowed to change that value in the implementing class. For example in the above program there is a variable i with value 10 if we try to change that value to 20 in the implementing class it will throw compiler error "The final field MyInterface.i cannot be assigned".
  • Any number of classes can implement an interface and each class is free to provide their own implementation. That's how using interfaces, Java fully utilizes "one interface, multiple methods" aspect of polymorphism.
  • A class can extend only one class, but a class can implement many interfaces. Which means multiple inheritance is not allowed in Java
  • An interface can extend another interface, similar to the way that a class can extend another class.

Extending an interface in Java

Just like class, if an interface in Java is inheriting from another interface it uses extends keyword.
An interface, unlike class, can extend more than one interface. If a class implements an interface that inherits another interface, it must provide implementation for all the methods that are there in the inheritance chain of interface.

Extending an interface Java Example

// Interface
public interface MyInterface {
 int i = 10;
 Number method1();
 void method2(String Id);
}
// extending interface
interface B extends MyInterface{
 void method3();
}

// class implements all methods of MyInterface and B
public class MyClass implements B {
 
 public Integer method1() {
  System.out.println("in method 1" + i);  
  return null;
 }

 public void method2(String Id) {
  System.out.println("in method 2");  
 }

 public void method3() {
  System.out.println("in method 3");  
 }
 public static void main(String[] args) {
  
 }
}

It can be seen that the class Myclass has to implement all the methods in the inheritance chain of the interface.

Partial implementation of interface by a class

If a class implements an interface but does not implement all the methods of that interface then that class must be declared as abstract.

public interface MyInterface {
 void method1();
 String method2(String Id);
}
implemnting java interface methods

Compiler error that class must implement methods declared in MyInterface interface.

But we can declare the class as abstract in that case

public abstract class AbstractClassDemo implements MyInterface {
 public static void main(String[] args) {
  System.out.println();
 }
}

Nested Interfaces in Java

An interface or a class can have another interface. Such an interface is known as nested interface or a member interface in Java.
A nested interface can be declared as public, private or protected. When a nested interface is used outside, it must be used as a fully qualified name i.e. must be qualified by the name of the class or interface of which it is a member.

Java nested interface Example

// Class with nested interface
class A{
 public interface TestInterface{
  void displayValue(String value);
 }
}

// class implementing the nested interface
class B implements A.TestInterface{
 public void displayValue(String value) {
  System.out.println("Value is " + value);
 }
}

public class MyClass{ 
 public static void main(String[] args) {
  // reference of class B assigned to nested interface
  A.TestInterface obRef = new B();
  obRef.displayValue("hello");
 }
}

Output

Value is hello

Interface and run time polymorphism

As we already know that there can't be an object of an interface, but interface can be used to create object references. As run time polymorphism in Java is implemented through the use of super class reference thus interface can be used to provide super class reference which holds references of sub-classes at run time and provide the appropriate functionality.

Run time polymorphism using interface Java Example

Let's assume that in an application there is a need to handle payment done through several modes like; cash, cheque, credit card etc. and based on the mode of the payment the functionality may be different.
This can be achieved through an interface where the interface defines a method payment and then several classes implement that interface and provide the functionality for the payment method according to the business needs. That's how using interfaces, Java fully utilizes "one interface, multiple methods" aspect of polymorphism.

public interface PaymentInt {
 public void payment(double amount);
}
// Cash Payment implementation of Payment interface
class CashPayment implements PaymentInt{
 // method implementation according to cash payment functionality
 public void payment(double amount) {
  System.out.println("Cash Payment of amount " + amount);
 }
}

//Cheque Payment implementation of Payment interface
class ChequePayment implements PaymentInt{
 // method implementation according to cheque payment functionality
 public void payment(double amount) {
  System.out.println("Cheque Payment of amount " + amount);  
 }
}

//CreditCard Payment implementation of Payment interface
class CreditCardPayment implements PaymentInt{
 // method implementation according to credit card payment functionality
 public void payment(double amount) {
  System.out.println("CreditCard Payment of amount " + amount);
 }
}

public class PaymentDemo {
 public static void main(String[] args) {
  // Payment interface reference holding the CashPayment obj
  PaymentInt paymentInt = new CashPayment();
  paymentInt.payment(134.67);
  // Payment interface reference holding the CreditCardPayment obj
  paymentInt = new CreditCardPayment();
  paymentInt.payment(2347.89);
  // Payment interface reference holding the ChequePayment obj
  paymentInt = new ChequePayment();
  paymentInt.payment(1567.45);
 }
}

Output

Cash Payment of amount 134.67
CreditCard Payment of amount 2347.89
Cheque Payment of amount 1567.45

It can be seen how at run time reference is changed and the appropriate payment method is called.

Points to note-

  • Interfaces help in achieving full abstraction in Java.
  • For top level interfaces only default and public access modifiers are permitted.
  • All the variables in interface are implicitly public, static and final.
  • All the methods in an interface are implicitly public and abstract.
  • The methods of an interface that are implemented by a class must be declared public.
  • If a class implements an interface but does not implement all the methods of that interface then that class must be declared as abstract.
  • A class can extend only one class, but implement many interfaces.
  • An interface, unlike class, can extend more than one interface.
  • An interface or a class can have another interface. Such an interface is known as nested interface or a member interface.
  • Any number of classes can implement an interface and each class is free to provide their own implementation. That's how using interfaces, Java fully utilizes "one interface, multiple methods" aspect of polymorphism.
  • With Java 8, it is possible to add a default implementation to a method in an interface.

That's all for this topic Interface in Java With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Basics Tutorial Page


Related Topics

  1. Marker Interface in Java
  2. Difference Between Abstract Class And Interface in Java
  3. Interface Default Methods in Java 8
  4. Interface Static Methods in Java 8
  5. Core Java Basics Interview Questions And Answers

You may also like-

  1. Polymorphism in Java
  2. final Keyword in Java With Examples
  3. Java Abstract Class and Abstract Method
  4. covariant return type in Java
  5. Varargs (Variable-length Arguments) in Java
  6. Difference between HashMap and ConcurrentHashMap in Java
  7. How to Loop Through a Map in Java
  8. Java Lambda Expression And Variable Scope

Wednesday, March 11, 2026

Matrix Subtraction Java Program

In this tutorial, we will explore how to write a Java Program for Matrix Subtraction, a process where corresponding elements of two matrices are subtracted index by index. Understanding matrix subtraction in Java not only strengthens your grasp of core programming concepts but also prepares you for solving real-world problems that rely on efficient handling of multidimensional data structures.

Understanding Matrix Subtraction

When you subtract two matrices subtraction is performed element by element, also known as index-wise subtraction. This means the element at position (0,0) in the first matrix is subtracted from the element at (0,0) in the second matrix, the element at (0,1) in the first matrix is subtracted from the element at (0,1) in the second matrix, and so on across all rows and columns.

For example, when subtracting two matrices of order 3x3, each corresponding element is processed individually to produce the resulting matrix.

matrix subtraction in Java

Which results in-

When working on a Java Program for Matrix Subtraction, it’s important to remember a few key rules to ensure accurate results.

  1. Both matrices must be of the same size (same number of rows and columns).
  2. The resultant matrix will also have the same order as the input matrices.
  3. Each element in the resultant matrix is obtained by subtracting the corresponding element of the second matrix from the first. For example, the element at (0,0) in the first matrix minus the element at (0,0) in the second matrix becomes the element at (0,0) in the resultant matrix.

Matrix subtraction Java program

 
import java.util.Scanner;

public class MatrixSubtraction {

  public static void main(String[] args) {
    int rowM, colM;
    Scanner in = new Scanner(System.in);
    
    System.out.print("Enter Number of Rows and Columns of Matrix : ");
    rowM = in.nextInt();
    colM = in.nextInt();
        
    int M1[][] = new int[rowM][colM];
    int M2[][] = new int[rowM][colM];
    int resMatrix[][] = new int[rowM][colM];
    
    System.out.print("Enter elements of First Matrix : ");
    
    for(int i = 0; i < rowM; i++){
      for(int j = 0; j < colM; j++){
        M1[i][j] = in.nextInt();
      }
    }
    System.out.println("First Matrix : " );
    for(int i = 0; i < rowM; i++){
      for(int j = 0; j < colM; j++){
        System.out.print(" " +M1[i][j]+"\t");
      }
      System.out.println();
    }
        
    System.out.print("Enter elements of Second Matrix : ");
    
    for(int i = 0; i < rowM; i++){
      for(int j = 0; j < colM; j++){
        M2[i][j] = in.nextInt();
      }
    }
    System.out.println("Second Matrix : " );
    for(int i = 0; i < rowM; i++){
      for(int j = 0; j < colM; j++){
        System.out.print(" " +M2[i][j] + "\t");
      }
      System.out.println();
    }
        
    // Subtraction logic 
    for(int i = 0; i < rowM; i++){
      for(int j = 0; j < colM; j++){
        resMatrix[i][j] = M1[i][j] - M2[i][j];
      }
    }
        
    // Printing the result matrix 
    System.out.println("Result Matrix : " );
    for(int i = 0; i < resMatrix.length; i++){
      for(int j = 0; j < colM; j++){
        System.out.print(" " +resMatrix[i][j]+"\t");
      }
      System.out.println();
    }
  }
}

Output

 
Enter Number of Rows and Columns of Matrix : 3 3

Enter elements of First Matrix : 1 3 4 2 5 6 4 3 2

First Matrix : 
 1  3  4 
 2  5  6 
 4  3  2
 
Enter elements of Second Matrix : 2 7 1 0 4 6 9 8 1

Second Matrix : 
 2  7  1 
 0  4  6 
 9  8  1
 
Result Matrix : 
 -1  -4  3 
  2   1  0 
 -5  -5  1 

That's all for this topic Matrix Subtraction Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page


Related Topics

  1. Matrix Multiplication Java Program
  2. Matrix Addition Java Program
  3. Remove Duplicate Elements From an Array in Java
  4. Factorial Program in Java
  5. Fibonacci Series Program in Java

You may also like-

  1. How to Display Pyramid Patterns in Java - Part2
  2. Zipping Files And Folders in Java
  3. How to Read File From The Last Line in Java
  4. How to Run a Shell Script From Java Program
  5. Object class in Java
  6. Type Wrapper Classes in Java
  7. Lock Striping in Java Concurrency
  8. How ArrayList Works Internally in Java

Python break Statement With Examples

Have you ever found yourself stuck in a loop that just won't quit? In this tutorial, you’ll master the break statement in Python, your primary tool for stopping a for or while loop dead in its tracks. Whether you've found the data you were looking for or hit a specific error condition, the break keyword gives you ultimate control over your program's flow.

How the break Statement in Python Works

When the Python interpreter encounters a break statement, it immediately terminates the current loop. The program "jumps" out of the loop block and moves straight to the very next line of code.

Key Use Cases and Best Practices

The break statement in Python is most effective when paired with an if statement. This allows you to exit a loop only when a specific criteria is met, saving valuable processing time.

  • Conditional Exit: Use if to check a condition; if True, trigger the break.
  • Search Optimization: Stop searching a list once the target item is found.
  • Nested Loops: Note that when using a break inside nested loops, Python only terminates the innermost loop where the statement resides. The outer loops will continue running as usual.

break statement Python examples

1- Using break statement with for loop in Python. In the example a tuple is iterated using a for loop to search for a specific number as soon as that number is found you need to break out of for loop.

numbers = (89, 102, 0, 234, 67, 10, 333, 32)
flag = False
for num in numbers:
    if num == 10:
        flag = True
        break
if flag:
    print('Number found in tuple')
else:
    print('Number not found in tuple')

Output

Number found in tuple

As you can see here break statement is used inside for loop to break out of loop as soon as the condition is satisfied (Search Optimization use case). Note that searching for an element in a tuple can be done in a more compact way like given below, above example is more of an illustration of break statement.

if searched_num in numbers:
    print('Number found in tuple')
else:
    print('Number not found in tuple')

2- Using break statement in Python with while loop. In the example there is an infinite while loop that is used to prompt user for an input. Condition here is that entered number should be greater than 10 to break out of while loop otherwise prompt user again to enter valid input.

while True:
   num = int(input("Enter a number greater than 10: "))
   # condition for breaking out of loop
   if num > 10:
       break
   print("Please enter a number greater than 10...")

print("Entered number is", num)

Output

Enter a number greater than 10: 7
Please enter a number greater than 10...
Enter a number greater than 10: 11
Entered number is 11

3- In this example we’ll see how to use break statement with nested loops. There are 2 for loops in the example and break statement is used in the scope of inner for loop so it breaks out of that loop when the condition is true.

for i in range(1, 6):
    print(i, "- ", end='')
    for j in range(1, 10):
        print('*', end='')
        # print only 5 times then break
        if j >= 5:
            break
    # move to next line now
    print()

Output

1 - *****
2 - *****
3 - *****
4 - *****
5 - *****

As you can see though the range of inner loop is (1..10) but it breaks out when j’s value is 5 because of the break statement. The end='' used in print statement ensures that the cursor doesn’t move to next line after printing in each iteration.

That's all for this topic Python break Statement With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python continue Statement With Examples
  2. pass Statement in Python
  3. Python assert Statement
  4. Python First Program - Hello World
  5. Python String split() Method

You may also like-

  1. Name Mangling in Python
  2. Inheritance in Python
  3. Class And Object in Python
  4. Python Generator, Generator Expression, Yield Statement
  5. Java Variable Types With Examples
  6. Ternary Operator in Java With Examples
  7. How to Convert Date And Time Between Different Time-Zones in Java
  8. Spring Constructor Based Dependency Injection

Tuesday, March 10, 2026

ArrayList in Java With Examples

The ArrayList in Java is one of the most popular and versatile collection classes, widely used because of its ability to grow dynamically. Unlike traditional arrays, you don’t need to anticipate in advance how many elements you are going to store in the ArrayList. An ArrayList automatically expands as new items are added, making it ideal for scenarios where data size is unpredictable.

Internally, however, the ArrayList isn’t a magical "elastic" structure. It starts with an underlying array of a default capacity (10 elements). When this limit is exceeded, Java creates a new array that is 1.5 times larger than the original array and efficiently copies the existing elements into it. This resizing strategy ensures smooth performance while offering the flexibility developers love about the ArrayList in Java

Refer How does ArrayList work internally in Java to know more about how does ArrayList work internally in Java.


Hierarchy of the ArrayList

To know the hierarchy of java.util.ArrayList you need to know about 2 interfaces and 2 abstract classes.

  • Collection Interface- Collection interface is the core of the Collection Framework. It must be implemented by any class that defines a collection.
  • List interface- List interface extends Collection interface. Apart from extending all the methods of the Collection interface, List interface defines some methods of its own.
  • AbstractCollection- Abstract class which implements most of the methods of the Collection interface.
  • AbstractList- Abstract class which extends AbstractCollection and implements most of the List interface.

ArrayList extends AbstractList and implements List interface too. Apart from List interface, ArrayList also implements RandomAccess, Cloneable, java.io.Serializable interfaces.

Monday, February 23, 2026

Polynomial Regression With Example

In this post we'll see how to use polynomial regression. With simple linear regression or multiple linear regression, a straight-line (linear) relationship between predictors and target is assumed but that may not be the case always with the real-world data. Also, if the scatterplot of the residuals (y_test - y_pred) versus the predicted values (y_pred) shows a curvature or patterns, it suggests that the relationship between predictors and the response is non-linear.

In such cases, a simple linear regression is inadequate, and a more flexible model like polynomial regression can often improve the fit.

Polynomial Regression

Polynomial regression is a kind of linear regression that allows you to model non-linear relationship between the independent variables (X) and the dependent variable (y) by using the polynomial terms of the independent variable(s).

Polynomial regression model for a single predictor, X, is:

$$ y=\beta _0+\beta _1x+\beta _2x^2+\beta _3x^3+\dots +\beta _nx^n+\epsilon$$

where n is called the degree of the polynomial, so above equation is a n-th degree polynomial. Such a relationship is called quadratic if degree is 2, cubic if degree is 3 and so on. Here

  • y is the dependent variable.
  • x is the independent variable.
  • \( \beta _0, \beta _1, \dots , \beta _n \) are the coefficients of the polynomial terms.
  • \(\epsilon\) is the error term.

If there are multiple predictors (like x1,x2), polynomial regression also includes-

  • Powers of each feature (x12,x22 )
  • interaction terms (for example, x1.x2)

Suppose the predictors are x1,x2,x3. A polynomial regression of degree 2 (quadratic) can be written as:

$$ y=\beta _0+\beta _1x_1+\beta _2x_2+\beta _3x_3+\beta _{11}x_1^2+\beta _{22}x_2^2+\beta _{33}x_3^2+ \\ \beta _{12}x_1x_2+\beta _{13}x_1x_3+\beta _{23}x_2x_3+\epsilon$$

  • \(\beta _0\): intercept
  • \(\beta _i\): linear coefficients
  • \(\beta _{ii}\): quadratic terms (squares of predictors)
  • \(\beta _{ij}\): interaction terms (cross-products between predictors)
  • \(\epsilon\) : error term

Generalized form of polynomial regression is as given below-

For a polynomial of degree d with three predictors:

$$y=\sum _{i+j+k\leq d}\beta _{ijk}\, x_1^i\, x_2^j\, x_3^k+\epsilon $$

One thing to keep in mind about polynomial regression is that, though the features are non-linear transformations of inputs, polynomial regression is still considered linear regression since it is linear in the regression coefficients \(\beta _1, \beta _2, \beta _3 … \beta _n\).

Polynomial linear regression using scikit-learn Python library

Dataset used here can be downloaded from- https://www.kaggle.com/datasets/rukenmissonnier/manufacturing-data-for-polynomial-regression/data

Goal is to predict the quality rating based on the given features.

In the implementation code is broken into several smaller units with some explanation in between for data pre-processing steps.

1. Importing libraries and reading CSV file

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
df = pd.read_csv('./manufacturing.csv')
manufacturing.csv file is in the current directory.

2. Getting info about the data.

print(df.info())

Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3957 entries, 0 to 3956
Data columns (total 6 columns):
 #   Column                          		Non-Null Count  Dtype  
---  ------                         	 	--------------  -----  
 0   Temperature (°C)               		3957 non-null   float64
 1   Pressure (kPa)                  		3957 non-null   float64
 2   Temperature x Pressure   				3957 non-null   float64
 3   Material Fusion Metric     			3957 non-null   float64
 4   Material Transformation Metric  		3957 non-null   float64
 5   Quality Rating                  		3957 non-null   float64

You can also use the following command to get summary statisctics like mean, standard deviation, min and max values for each columns.

print(df.describe())

3. Removing columns

You can check for duplicate rows in order to remove them if required.

#checking for duplicates
print(df.duplicated().sum()) #0

4. Another check is for missing values

#count the number of missing (null, or NaN) values in each column of a DataFrame
print(df.isnull().sum())

Output

Temperature (°C)                  		0
Pressure (kPa)                    		0
Temperature x Pressure            		0
Material Fusion Metric           	 	0
Material Transformation Metric    		0
Quality Rating                    		0

So, there are no missing values.

5. Checking for multicollinearity

You can also check for multicollinearity by displaying a correlation heatmap which displays the relationships between variables.

  • Values close to 1 or -1 indicate strong correlations
  • Values close to 0 indicate weak or no correlations
# check for multicollinearity
correlation_matrix = df.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

If you want to remove any columns because of high multicollinearity following code can be used, in this example code no column has been removed.

# select columns with numerical values
v = X.select_dtypes(include ='number')
corr_matrix = v.corr().abs()   # absolute correlations
#corr_matrix
#print(corr_matrix)
upper = corr_matrix.where(
    #upper triangular part of an array
    np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
)
# get the columns having any corr value > .85
to_drop = [column for column in upper.columns if any(upper[column] > 0.85)]
print(to_drop)
X_reduced = X.drop(columns=to_drop)

6. Feature and label selection

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

Explanation-

X = df.iloc[:, :-1] in this

  • : means "select all rows."
  • :-1 means "select all columns except the last one."

y = df.iloc[:, -1]

  • : means select all rows.
  • -1 means select the last column, uses negative indexing

7. Plotting predictor-target relationship using scatter plot to show that it is not linear

#plot predictor-target relationship using scatter plot
features = X.columns
fig, axes = plt.subplots(1, len(features), sharey=True, figsize=(15, 4))
for i, col in enumerate(features):
    #plt.scatter(df[col], df["Quality Rating"])
    sns.scatterplot(x=df[col], y=df["Quality Rating"], ax=axes[i])

    axes[i].set_xlabel(col)
    axes[i].set_title(f"{col} \nvs Quality Rating")
plt.show()
Polynomial Regression

8. Splitting and scaling data

Splitting is done using train_test_split where test_size is passed as 0.2, meaning 20% of the data is used as test data whereas 80% of the data is used to train the model.

As seen in polynomial regression equation, it creates higher-degree terms (squared, cubic, etc.) from your variables. These variables will increase exponentially in value, which can skew the results. That is why normalizing your features is important, otherwise features with larger numeric ranges can dominate the model.

Note that both fitting and transformation (using fit_transform) is done for training data, whereas only transform() method is used for test data. That's how it should be done.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
#Polynomial Regression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state = 0)

#scaling values
scaler_X = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)

X_test_scaled = scaler_X.transform(X_test)

9. Polynomial Feature

Next thing is to get the degree of the polynomial. With the help of PolynomialFeatures class in scikit-learn library it becomes very easy to transform your existing features into higher-degree terms.

poly_reg = PolynomialFeatures(degree=2, include_bias=False)

x_poly = poly_reg.fit_transform(X_train_scaled)

The parameter include_bias controls whether a bias (intercept) column of ones is added to the transformed feature matrix. When you use PolynomialFeatures together with LinearRegression by default, LinearRegression(fit_intercept=True) already adds an intercept term to the model. So, if you also set include_bias=True in PolynomialFeatures, you'll end up with a redundant constant column of ones in your design matrix.

10. Fitting the model

lin_reg = LinearRegression()
lin_reg.fit(x_poly, y_train)

You may think why LinearRegression is used here. Keep in mind that it is applied to the polynomial features (x_poly).

Once the model is trained, predictions can be made using test data which can then be compared with the actual test data (y_test)

# predicting values
y_pred = lin_reg.predict(poly_reg.transform(X_test_scaled))

11. Comparing test and predicted data

# getting the residual percentage
df_results = pd.DataFrame({'Target':y_test, 'Predictions':y_pred})
df_results['Residual'] = df_results['Target'] - df_results['Predictions']
df_results['Difference%'] = np.abs((df_results['Residual'] * 100)/df_results['Target'])
print(df_results.head(10))

Output

      Target  Predictions  Residual  Difference%
3256  100.00       102.00     -2.00         2.00
142   100.00        99.54      0.46         0.46
2623   99.58       103.70     -4.12         4.14
3741  100.00       100.79     -0.79         0.79
2858   99.58       103.68     -4.10         4.11
3137   95.87        93.66      2.22         2.31
2672  100.00        99.01      0.99         0.99
1420  100.00        99.08      0.92         0.92
1669  100.00        98.94      1.06         1.06
1606  100.00        99.25      0.75         0.75

12. Seeing the model metrics such as R squared, mean squared error and root mean squared error.

#Metrics - R-Squared, MSE, RMSE
print("R2 score", r2_score(y_test, y_pred)) 
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error", mse)
print("Root Mean Squared Error", np.sqrt(mse))

13. Plotting residuals Vs predicted values

# Residuals = actual - predicted
residuals = y_test - y_pred

# Scatterplot: residuals vs fitted
sns.scatterplot(x=y_pred, y=residuals)
plt.axhline(0, color='red', linestyle='--')  # reference line at 0
plt.xlabel("Predicted Values (y_pred)")
plt.ylabel("Residuals (y_test - y_pred)")
plt.title("Residuals vs Fitted")
plt.show()
Polynomial Regression Python example

In polynomial regression, the plot of residuals vs. predicted values should look like a random, evenly scattered points around the horizontal zero line (y=0). As you can see, above plot doesn't show a very evenly scattered points and residuals are forming a curved pattern too.

If degree is increased to 5 in polynomial features

poly_reg = PolynomialFeatures(degree=5, include_bias=False)

and modelling is done then the plot of residuals vs. fitted values looks as given below.

The curve is less pronounced than in the quadratic case. However, there's still some systematic pattern, points are not fully randomly scattered.

That's all for this topic Polynomial Regression With Example. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. R-squared - Coefficient of Determination
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Wednesday, February 18, 2026

Mean Squared Error (MSE) With Python Examples

Mean Squared Error (MSE) is one of the most widely used metrics for evaluating the performance of regression models. It evaluates the prediction accuracy by measuring the average squared difference between predicted and actual values.

Mean Squared Error (MSE) equation

The formula for the mean squared error is-

$$ MSE=\frac{1}{n}\sum _{i=1}^n(y_i-\hat {y}_i)^2 $$

Here n is the total number of observations

yi is the actual value

\(\hat {y}_i \) is the predicted value

The MSE measures the average of the squared differences between predicted values and actual target values, which leads to the following characteristics-

  1. It’s in squared units of the target variable, so it’s not directly intuitive.
  2. MSE Penalizes Large Errors. If your dataset has extreme values, MSE will reflect them strongly. Which means it is sensitive to outliers. For example, an error of 10 when squared- 102=100. At the same time an error of 100 when squared- 1002=10,000
  3. Smaller the MSE, better the model’s prediction accuracy.
  4. Values of MSE may be used for comparing two or more statistical models, in case multiple models are considered for the dataset. Model with lowest MSE is deemed better

MSE calculation example using Numpy in Python

import numpy as np
y_test = [6295, 10698, 13860, 13499, 15750]
y_pred = [5691, 12380, 18371, 15935, 22500]

mse = np.mean(np.square(np.subtract(y_test, y_pred)))
print(mse)

Output

15007931.4

MSE calculation example using sklearn in Python

from sklearn.metrics import mean_squared_error
y_test = [6295, 10698, 13860, 13499, 15750]
y_pred = [5691, 12380, 18371, 15935, 22500]
mse = mean_squared_error(y_test, y_pred)
print(mse)

Output

15007931.4

That's all for this topic Mean Squared Error (MSE) With Python Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Encapsulation in Python
  2. R-squared - Coefficient of Determination
  3. Mean, Median and Mode With Python Examples
  4. Simple Linear Regression With Example
  5. Multiple Linear Regression With Example

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading

Monday, February 9, 2026

Multiple Linear Regression With Example

In the post Simple Linear Regression With Example we saw how to create a Simple linear regression model using the scikit-learn library in Python. In this post we'll see how to create a multiple linear regression model using the scikit-learn library. We'll also go through the steps for data pre-processing and cleaning, feature transformation, encoding categorical data.

Multiple Linear Regression

In simple linear regression we model relationship between one independent variable (predictor) and one dependent variable. Multiple Linear Regression is a fundamental statistical technique used to model the relationship between one dependent variable and multiple independent variables. So, we'll create a model to analyse how multiple features affect the outcome.

Multiple Linear Regression equation

In context of machine learning where we have sample data and we use it to create regression model, multiple linear regression equation is as given below.

$$ \hat {y}=b_0 + b_1x_1 + b_2x_2 + b_3x_3 + ….. + b_nx_n $$

Here \(\hat{y}\) is the predicted label - Output

b0 is the intercept, which tells you where the regression line intercepts the Y-axis. Or you can say it is the value when all the predictors x1, x2, x3, .. , xn are zero.

b1, b2, bn are the slopes. It tells how much dependent variable changes for one unit change in given independent variable when all the other independent variables are held constant. For example, b1 represents the estimated change in \(\hat {y}\), against per unit increase in x1 when x2, x3, .. , xn are held constant. To explain it in other words, if you want to interpret b1, you imagine increasing x1 by 1 unit while keeping x2, x3, .. , xn unchanged. Then the predicted change in \(\hat {y}\) is exactly b1. Same logic applies for b2, b3 and so on.

The residual (difference between actual value and predicted value) term is calculated as \(e_i = y_i - \hat{y}_i\).

In the model these slopes (b1, b2, …) are chosen to minimize the mean of this residual sum of squares (Mean Squared Error).

$$ L=\frac{1}{n}\sum _{i=1}^n(y_i-\hat {y}_i)^2 $$

The goal of the model is to find the best fit line which has the minimum Mean Squared Error.

Multiple linear regression using scikit-learn Python library

Dataset used here can be downloaded from- https://www.kaggle.com/datasets/hellbuoy/car-price-prediction

Goal is to predict the car price based on the given features.

In the implementation code is broken into several smaller units with some explanation in between for data pre-processing steps.

1. Importing libraries and reading CSV file

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('./CarPrice_Assignment.csv')

2. Getting info about the data. The parameter include='all' includes all the columns otherwise only columns with numerical values are included.

df.describe(include='all')

3. Removing columns

There are 26 columns and 205 rows in the dataset. On analyzing the data, you can make an observation that "car_ID" is inconsequential, "CarName" column has 147 unique values, encoding these many unique categorical values will be a problem so we'll drop these 2 columns. Also, "enginelocation" column has 'front' as value for 202 out of 205 rows, making it as good as a constant, so we can drop this column too.

# dropping columns car_id (just a unique id)
# CarName as there are 147 unique car names, encoding it will add lot of columns
df = df.drop(columns=['car_ID', 'CarName'])

# dropping columns enginelocation (as 202 entries are with value 'front')
df = df.drop("enginelocation", axis=1)

4. Removing outliers

We can also check for outliers in the dependent and independent variables because extreme values can disproportionately affect the regression line.

For that we can plot a distribution, if we do it for 'price'

g = sns.displot(df['price'], kde=True, bins=30)
g.set(xlim=(0, None))
normal distribution

As you can see, there is a positive skew. Let's say we want to avoid top 1% of the prices (treat them as outliers), that can be done using quantile() function in Pandas to return 99th percentile value.

#taking only the 99% records
qnt = df['price'].quantile(0.99)
data_re = df[df['price'] < qnt]

After that if you run the data_re.describe(include='all') code line, you can see that 3 rows are removed and the max price is now 37028. That way we have decreased some of the skewness in the price data.

Same way you can check for some of the independent variables and remove some outliers if needed.

5. Resetting index

If you have removed few records based on quantile values, you can use reset_index() to rearrange the index of a DataFrame back to the default integer index (0, 1, 2, …). As removing random records disturbs the default index.

#resetting indices
data_processed = data_re.reset_index(drop=True)

6. Checking for linear relationship between variables.

You can also plot the independent variables Vs price and verify the scatter plot, if scatterplot looks roughly like a straight line that means a likely linear relationship. In case relationship doesn't look linear we may have to use logarithmic transformation, square root transformation to transform the data.

f, (p1,p2,p3) = plt.subplots(1,3, sharey=True, figsize=(15,3))
p1.scatter(data_processed['enginesize'], data_processed['price'])
p1.set_title('EngineSize and Price')
p2.scatter(data_processed['horsepower'], data_processed['price'])
p2.set_title('HorsePower and Price')
p3.scatter(data_processed['highwaympg'], data_processed['price'])
p3.set_title('HighwayMpg and Price')
ScatterPlot

At least for these variables relationship looks linear.

7. Feature and label selection

y = data_processed['price']   # dependent variable
X = data_processed.drop(columns=['price'])   # independent variables

8. Checking for multicollinearity

Multicollinearity in linear regression occurs when two or more independent variables are highly correlated, meaning they provide redundant information, making it difficult to quantify the individual contribution of each independent variable to the dependent variable.

For detecting multicollinearity two of the most used options are-

Correlation Matrix

A correlation matrix is a matrix displaying correlation coefficients for all the possible pairs of predictors. That helps to find relationships between independent variables. Look for high correlation coefficients (e.g., >0.7 or 0.8) between predictors.

Variance Inflation Factor (VIF)

The Variance Inflation Factor (VIF) measures how much the variance of an estimated regression coefficient is increased due to collinearity (correlation) among predictor variables. A high VIF of greater than 10 indicates multicollinearity.

If you want to use correlation matrix to find the high correlation (here it is kept as 0.85) then following code will drop the features.

# select columns with numerical values
v = X.select_dtypes(include ='number')
corr_matrix = v.corr().abs()   # absolute correlations
#corr_matrix
#print(corr_matrix)
upper = corr_matrix.where(
    #upper triangular part of an array
    np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
)
# get the columns having any corr value > .85
to_drop = [column for column in upper.columns if any(upper[column] > 0.85)]
print(to_drop)
X_reduced = X.drop(columns=to_drop)

If you want to use VIF (upper limit kept as 10) then following code will drop the features. Note that the statsmodels library provides the variance_inflation_factor function to compute VIF for each variable in a regression model.

from statsmodels.stats.outliers_influence import variance_inflation_factor
# select columns with numerical values
v = X.select_dtypes(include ='number')
#select catgorical features
categorical_features = X.select_dtypes(exclude='number').columns
# create a new dataframe
vif = pd.DataFrame()
vif["VIF"] = [variance_inflation_factor(v.values, i) for i in range(v.shape[1])]
vif["Features"] = v.columns
#get the columns where VIF is less than or equal to 10
valid_numeric = vif.loc[vif["VIF"] <= 10, "Features"]
final_features = list(valid_numeric) + list(categorical_features)
X_reduced = X[final_features]

I have used correlation matrix code in this example which drops the following columns.

['carlength', 'curbweight', 'enginesize', 'highwaympg']

9. Splitting and encoding data

Splitting is done using train_test_split where test_size is passed as 0.2, meaning 20% of the data is used as test data whereas 80% of the data is used to train the model. OneHotEncoder is used to encode categorical data. With OneHotEncoder, drop = 'first' parameter is used so that new columns are created for only n-1 unique values, that helps in avoiding dummy variable trap. The parameter handle_unknown = 'ignore' helps with training vs. transform mismatch. The encoder learns categories from the training set. If new categories appear in the test set, they're "unknown". Parameter handle_unknown='ignore' ensures unseen categories don't break the transform step. They'll be encoded as all zeros.

Note that both fitting and transformation (using fit_transform) is done for training data, where as only transform() method is used for test data. That's how it should be done. When you split data into test data and train data there is a chance some value may not appear in train data and only in test data. Your encoder has learned about the categories from training data, when a new value is encountered while transforming test data it is an unknown value for encoder, that's where handle_unknown='ignore' parameter helps.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
from sklearn.metrics import r2_score, mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.2, random_state=0)

ct = ColumnTransformer([
    ('encoder', OneHotEncoder(sparse_output = False, drop = 'first',handle_unknown = 'ignore'), X_reduced.select_dtypes(exclude='number').columns)
],remainder = 'passthrough')
 
ct.fit_transform(X_train)
X_train_enc = ct.fit_transform(X_train)
X_test_enc = ct.transform(X_test)

10. Training the model

From sklearn you import the LinearRegression class. Later you have to create an object of this class and call the fit method to train the model, parameters passed to the fit method are training data (X_train in our case) and target values (y_train in our case).

reg = LinearRegression()
reg.fit(X_train_enc, y_train)

11. Once the model is trained, predictions can be made using test data which can then be compared with the actual test data (y_test)

y_pred = reg.predict(X_test_enc)

12. Comparing test and predicted data

df_results = pd.DataFrame({'Target':y_test, 'Predictions':y_pred})
df_results['Residual'] = df_results['Target'] - df_results['Predictions']
df_results['Difference%'] = np.abs((df_results['Residual'] * 100)/df_results['Target'])
print(df_results.head())
	Target   Predictions     Residual  Difference%
18    6295.0   5691.859375   603.140625     9.581265
171  10698.0  12380.265625 -1682.265625    15.725048
107  13860.0  18371.781250 -4511.781250    32.552534
98   13499.0  15935.093750 -2436.093750    18.046476
178  15750.0  22500.046875 -6750.046875    42.857440
print(df_results.describe())
	 Target   	Predictions     Residual  Difference%
count     41.000000     41.000000    41.000000    41.000000
mean   13564.524390  13802.394436  -237.870046    15.610925
std     7463.439157   6884.600274  2839.126221    11.662503
min     6189.000000   5691.859375 -7951.265625     0.230753
25%     8495.000000   9159.765625 -1665.515625     7.594619
50%    11694.000000  12293.171875  -448.171875    13.121822
75%    15750.000000  16523.468750  1010.000000    21.949032
max    37028.000000  33051.968750  5494.531250    48.020689

As you can see min difference percentage is 0.23 while the max is 48.02.

13. Seeing the model metrics such as R-squared, mean squared error and root mean squared error.

print("R2 score", r2_score(y_test, y_pred)) 
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error", mse)
print("Root Mean Squared Error", np.sqrt(mse))

That's all for this topic Multiple Linear Regression With Example. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Python Tutorial Page


Related Topics

  1. Python Installation on Windows
  2. Encapsulation in Python
  3. Method Overriding in Python
  4. R-squared - Coefficient of Determination
  5. Mean, Median and Mode With Python Examples

You may also like-

  1. Passing Object of The Class as Parameter in Python
  2. Local, Nonlocal And Global Variables in Python
  3. Python count() method - Counting Substrings
  4. Python Functions : Returning Multiple Values
  5. Marker Interface in Java
  6. Functional Interfaces in Java
  7. Difference Between Checked And Unchecked Exceptions in Java
  8. Race Condition in Java Multi-Threading