FPE Library Documentation

1. Library Overview

The Feature Probability-based Estimation (FPE) library helps identify the most important features in a dataset by calculating feature-wise probabilities. It is particularly useful for feature selection tasks to improve machine learning model performance by removing less significant features.

Dataset Conditions

Data Consistency: Each feature must consist of either entirely strings or entirely numeric values. Mixing types is not allowed.
No Missing Data: No empty cells (e.g., NaN or blanks) should exist in the feature data.
Uniform Data Type: All values within a single feature must follow the same data type.
Target Column: The last column contains the target or label for the dataset. Works on labeled datasets only.

2. Installation

Install the FPE library using pip:

pip install fpe-lib==0.1.2

3. Usage

3.1 Importing the Library

from fpe.fpe import fpefs

3.2 Input Dataset

Features (X): All columns except the last one are considered features.
Target (y): The last column is treated as the target variable.

3.3 Example Usage


import pandas as pd
from fpe.fpe import fpefs

# Sample dataset
data = pd.DataFrame({
    'Feature1': [1, 2, 3, 4, 5],
    'Feature2': ['A', 'B', 'A', 'B', 'C'],
    'Target': [1, 0, 1, 0, 1]
})

# Apply FPEFS
result = fpefs(data)

# View results
print(result)

Output:


  Feature  Probability
0  Feature1         0.75
1  Feature2         0.75

4. Algorithmic Working

4.1 Steps of the Algorithm

Initiation: Loads and splits the dataset into features (X) and target (y), ensuring proper data types and structure.
Feature Normalization: Applies Min-Max scaling to numeric features, normalizing them to [0, 1].
Group Rows by Unique Values: Groups feature values by indices and identifies corresponding target classes.
Analyze Class Coverage: Evaluates the relationship between feature values and target classes to compute partial coverage.
Compute Feature Probabilities: Assigns probability scores to features based on class separation ability.
Return Probabilities: Outputs a DataFrame with feature names and their corresponding probabilities.