⬅ Previous Next ➡

Working with Libraries

NumPy Fundamentals (Array Basics)
  • NumPy is used for fast numerical computing.
  • Main object: ndarray (n-dimensional array).
  • Arrays are faster than Python lists for math operations.
import numpy as np

a = np.array([1, 2, 3, 4])
b = np.array([[1, 2], [3, 4]])

print(a)
print(b)
print(a.shape)     # (4,)
print(b.shape)     # (2, 2)
print(a.dtype)
NumPy Array Creation (zeros, ones, arange, linspace)
  • zeros(): array filled with 0
  • ones(): array filled with 1
  • arange(): range like list but returns array
  • linspace(): equally spaced values
import numpy as np

z = np.zeros((2, 3))
o = np.ones((2, 3))
r = np.arange(0, 10, 2)
l = np.linspace(0, 1, 5)

print(z)
print(o)
print(r)
print(l)
NumPy Mathematical Operations (Element-wise + Broadcasting)
  • Operations are element-wise: +, -, *, /, **
  • Broadcasting allows operations between different shapes.
import numpy as np

a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

print(a + b)
print(a * b)
print(a ** 2)

# Broadcasting
c = np.array([1, 2, 3])
print(c + 5)        # adds 5 to each element
NumPy Aggregation (sum, mean, max, min)
  • Aggregation functions: sum(), mean(), max(), min(), std()
  • Use axis=0 (column-wise) and axis=1 (row-wise).
import numpy as np

m = np.array([[1, 2, 3],
              [4, 5, 6]])

print(m.sum())
print(m.mean())
print(m.max())

print(m.sum(axis=0))   # column sums
print(m.sum(axis=1))   # row sums
Pandas Series (Basics)
  • Series is a 1D labeled array.
  • Index labels can be auto or custom.
import pandas as pd

s1 = pd.Series([10, 20, 30])
s2 = pd.Series([80, 90, 85], index=["A", "B", "C"])

print(s1)
print(s2)
print(s2["B"])
Pandas DataFrame (Basics)
  • DataFrame is a 2D table (rows and columns).
  • Can be created from dictionary, list of dicts, or CSV.
import pandas as pd

data = {
    "Name": ["Sourav", "Amit", "Rita"],
    "Age": [25, 22, 23],
    "Marks": [80, 90, 85]
}

df = pd.DataFrame(data)
print(df)
print(df["Name"])
print(df.head())
Data Cleaning (Missing Values + Drop + Fill)
  • Check missing: isna(), isnull()
  • Remove missing: dropna()
  • Fill missing: fillna() (mean/0/value)
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "Name": ["A", "B", "C"],
    "Marks": [80, np.nan, 90]
})

print(df.isna())

df2 = df.fillna(0)
print(df2)

df3 = df.dropna()
print(df3)
Data Cleaning (Duplicates + Rename + Type Conversion)
  • Remove duplicates: drop_duplicates()
  • Rename columns: rename()
  • Convert types: astype()
import pandas as pd

df = pd.DataFrame({
    "Name": ["A", "A", "B"],
    "Marks": ["80", "80", "90"]
})

df = df.drop_duplicates()
df = df.rename(columns={"Marks": "Score"})
df["Score"] = df["Score"].astype(int)

print(df)
Aggregation and GroupBy (sum, mean, count)
  • groupby() is used to aggregate data category-wise.
  • Common aggregates: sum(), mean(), count(), min(), max().
import pandas as pd

df = pd.DataFrame({
    "Dept": ["CSE", "CSE", "IT", "IT"],
    "Marks": [80, 90, 70, 85]
})

print(df.groupby("Dept")["Marks"].mean())
print(df.groupby("Dept")["Marks"].sum())
print(df.groupby("Dept")["Marks"].count())
Matplotlib Visualization (Line and Bar)
  • Matplotlib is used for plotting graphs in Python.
  • Common plots: line, bar, scatter, histogram.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 15, 25]

plt.plot(x, y)
plt.title("Line Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

names = ["A", "B", "C"]
marks = [80, 90, 85]

plt.bar(names, marks)
plt.title("Bar Chart")
plt.xlabel("Name")
plt.ylabel("Marks")
plt.show()
Matplotlib Visualization (Scatter and Histogram)
  • scatter shows relationship between two variables.
  • hist shows frequency distribution.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

data = [10, 12, 12, 13, 15, 15, 15, 18, 20]

plt.hist(data, bins=5)
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
⬅ Previous Next ➡