!pip install --upgrade --force-reinstall --no-cache-dir numpy pandas gensim datasets evaluate

Collecting numpy
  Downloading numpy-2.3.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 2.6 MB/s eta 0:00:00
Collecting pandas
  Downloading pandas-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (91 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.2/91.2 kB 5.6 MB/s eta 0:00:00
Collecting gensim
  Downloading gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.1 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting evaluate
  Downloading evaluate-0.4.4-py3-none-any.whl.metadata (9.5 kB)
Collecting python-dateutil>=2.8.2 (from pandas)
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 298.6 MB/s eta 0:00:00
Collecting scipy<1.14.0,>=1.7.0 (from gensim)
  Downloading scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.6/60.6 kB 156.7 MB/s eta 0:00:00
Collecting smart-open>=1.8.1 (from gensim)
  Downloading smart_open-7.1.0-py3-none-any.whl.metadata (24 kB)
Collecting filelock (from datasets)
  Downloading filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-20.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting requests>=2.32.2 (from datasets)
  Downloading requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting tqdm>=4.66.3 (from datasets)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.7/57.7 kB 76.5 MB/s eta 0:00:00
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting huggingface-hub>=0.24.0 (from datasets)
  Downloading huggingface_hub-0.33.0-py3-none-any.whl.metadata (14 kB)
Collecting packaging (from datasets)
  Downloading packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting pyyaml>=5.1 (from datasets)
  Downloading PyYAML-6.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading aiohttp-3.12.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.6 kB)
Collecting typing-extensions>=3.7.4.3 (from huggingface-hub>=0.24.0->datasets)
  Downloading typing_extensions-4.14.0-py3-none-any.whl.metadata (3.0 kB)
Collecting hf-xet<2.0.0,>=1.1.2 (from huggingface-hub>=0.24.0->datasets)
  Downloading hf_xet-1.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (879 bytes)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas)
  Downloading six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting charset_normalizer<4,>=2 (from requests>=2.32.2->datasets)
  Downloading charset_normalizer-3.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests>=2.32.2->datasets)
  Downloading idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.32.2->datasets)
  Downloading urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests>=2.32.2->datasets)
  Downloading certifi-2025.6.15-py3-none-any.whl.metadata (2.4 kB)
Collecting wrapt (from smart-open>=1.8.1->gensim)
  Downloading wrapt-1.17.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.4 kB)
Collecting aiohappyeyeballs>=2.5.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.1.2 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting attrs>=17.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading attrs-25.3.0-py3-none-any.whl.metadata (10 kB)
Collecting frozenlist>=1.1.1 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading frozenlist-1.7.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading multidict-6.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting propcache>=0.2.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading propcache-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading yarl-1.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (73 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.9/73.9 kB 181.1 MB/s eta 0:00:00
Downloading pandas-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.4/12.4 MB 220.7 MB/s eta 0:00:00
Downloading gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.7/26.7 MB 248.5 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 229.3 MB/s eta 0:00:00
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 491.5/491.5 kB 412.5 MB/s eta 0:00:00
Downloading evaluate-0.4.4-py3-none-any.whl (84 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 kB 335.9 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 306.0 MB/s eta 0:00:00
Downloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.6/193.6 kB 365.5 MB/s eta 0:00:00
Downloading huggingface_hub-0.33.0-py3-none-any.whl (514 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 514.8/514.8 kB 391.5 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 kB 398.7 MB/s eta 0:00:00
Downloading packaging-25.0-py3-none-any.whl (66 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 kB 307.8 MB/s eta 0:00:00
Downloading pyarrow-20.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (42.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.3/42.3 MB 250.2 MB/s eta 0:00:00
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 392.3 MB/s eta 0:00:00
Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 kB 414.8 MB/s eta 0:00:00
Downloading PyYAML-6.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (762 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 763.0/763.0 kB 402.2 MB/s eta 0:00:00
Downloading requests-2.32.4-py3-none-any.whl (64 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.8/64.8 kB 266.9 MB/s eta 0:00:00
Downloading scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.6/38.6 MB 185.8 MB/s eta 0:00:00
Downloading smart_open-7.1.0-py3-none-any.whl (61 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.7/61.7 kB 298.4 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 332.4 MB/s eta 0:00:00
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 kB 367.8 MB/s eta 0:00:00
Downloading filelock-3.18.0-py3-none-any.whl (16 kB)
Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 kB 235.4 MB/s eta 0:00:00
Downloading aiohttp-3.12.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 267.1 MB/s eta 0:00:00
Downloading certifi-2025.6.15-py3-none-any.whl (157 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 157.7/157.7 kB 229.0 MB/s eta 0:00:00
Downloading charset_normalizer-3.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (147 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 147.3/147.3 kB 349.4 MB/s eta 0:00:00
Downloading hf_xet-1.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 353.6 MB/s eta 0:00:00
Downloading idna-3.10-py3-none-any.whl (70 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 kB 234.1 MB/s eta 0:00:00
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Downloading typing_extensions-4.14.0-py3-none-any.whl (43 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 167.1 MB/s eta 0:00:00
Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 kB 197.4 MB/s eta 0:00:00
Downloading wrapt-1.17.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (83 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.2/83.2 kB 213.2 MB/s eta 0:00:00
Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Downloading aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB)
Downloading attrs-25.3.0-py3-none-any.whl (63 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 kB 225.4 MB/s eta 0:00:00
Downloading frozenlist-1.7.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (235 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235.3/235.3 kB 255.5 MB/s eta 0:00:00
Downloading multidict-6.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.5/231.5 kB 210.5 MB/s eta 0:00:00
Downloading propcache-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.5/213.5 kB 239.8 MB/s eta 0:00:00
Downloading yarl-1.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (348 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.0/349.0 kB 211.7 MB/s eta 0:00:00
Installing collected packages: pytz, xxhash, wrapt, urllib3, tzdata, typing-extensions, tqdm, six, pyyaml, pyarrow, propcache, packaging, numpy, multidict, idna, hf-xet, fsspec, frozenlist, filelock, dill, charset_normalizer, certifi, attrs, aiohappyeyeballs, yarl, smart-open, scipy, requests, python-dateutil, multiprocess, aiosignal, pandas, huggingface-hub, gensim, aiohttp, datasets, evaluate
  Attempting uninstall: pytz
    Found existing installation: pytz 2025.2
    Uninstalling pytz-2025.2:
      Successfully uninstalled pytz-2025.2
  Attempting uninstall: xxhash
    Found existing installation: xxhash 3.5.0
    Uninstalling xxhash-3.5.0:
      Successfully uninstalled xxhash-3.5.0
  Attempting uninstall: wrapt
    Found existing installation: wrapt 1.17.2
    Uninstalling wrapt-1.17.2:
      Successfully uninstalled wrapt-1.17.2
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.4.0
    Uninstalling urllib3-2.4.0:
      Successfully uninstalled urllib3-2.4.0
  Attempting uninstall: tzdata
    Found existing installation: tzdata 2025.2
    Uninstalling tzdata-2025.2:
      Successfully uninstalled tzdata-2025.2
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.14.0
    Uninstalling typing_extensions-4.14.0:
      Successfully uninstalled typing_extensions-4.14.0
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.67.1
    Uninstalling tqdm-4.67.1:
      Successfully uninstalled tqdm-4.67.1
  Attempting uninstall: six
    Found existing installation: six 1.17.0
    Uninstalling six-1.17.0:
      Successfully uninstalled six-1.17.0
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 6.0.2
    Uninstalling PyYAML-6.0.2:
      Successfully uninstalled PyYAML-6.0.2
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 18.1.0
    Uninstalling pyarrow-18.1.0:
      Successfully uninstalled pyarrow-18.1.0
  Attempting uninstall: propcache
    Found existing installation: propcache 0.3.2
    Uninstalling propcache-0.3.2:
      Successfully uninstalled propcache-0.3.2
  Attempting uninstall: packaging
    Found existing installation: packaging 24.2
    Uninstalling packaging-24.2:
      Successfully uninstalled packaging-24.2
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
  Attempting uninstall: multidict
    Found existing installation: multidict 6.4.4
    Uninstalling multidict-6.4.4:
      Successfully uninstalled multidict-6.4.4
  Attempting uninstall: idna
    Found existing installation: idna 3.10
    Uninstalling idna-3.10:
      Successfully uninstalled idna-3.10
  Attempting uninstall: hf-xet
    Found existing installation: hf-xet 1.1.3
    Uninstalling hf-xet-1.1.3:
      Successfully uninstalled hf-xet-1.1.3
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: frozenlist
    Found existing installation: frozenlist 1.7.0
    Uninstalling frozenlist-1.7.0:
      Successfully uninstalled frozenlist-1.7.0
  Attempting uninstall: filelock
    Found existing installation: filelock 3.18.0
    Uninstalling filelock-3.18.0:
      Successfully uninstalled filelock-3.18.0
  Attempting uninstall: dill
    Found existing installation: dill 0.3.7
    Uninstalling dill-0.3.7:
      Successfully uninstalled dill-0.3.7
  Attempting uninstall: charset_normalizer
    Found existing installation: charset-normalizer 3.4.2
    Uninstalling charset-normalizer-3.4.2:
      Successfully uninstalled charset-normalizer-3.4.2
  Attempting uninstall: certifi
    Found existing installation: certifi 2025.6.15
    Uninstalling certifi-2025.6.15:
      Successfully uninstalled certifi-2025.6.15
  Attempting uninstall: attrs
    Found existing installation: attrs 25.3.0
    Uninstalling attrs-25.3.0:
      Successfully uninstalled attrs-25.3.0
  Attempting uninstall: aiohappyeyeballs
    Found existing installation: aiohappyeyeballs 2.6.1
    Uninstalling aiohappyeyeballs-2.6.1:
      Successfully uninstalled aiohappyeyeballs-2.6.1
  Attempting uninstall: yarl
    Found existing installation: yarl 1.20.1
    Uninstalling yarl-1.20.1:
      Successfully uninstalled yarl-1.20.1
  Attempting uninstall: smart-open
    Found existing installation: smart-open 7.1.0
    Uninstalling smart-open-7.1.0:
      Successfully uninstalled smart-open-7.1.0
  Attempting uninstall: scipy
    Found existing installation: scipy 1.15.3
    Uninstalling scipy-1.15.3:
      Successfully uninstalled scipy-1.15.3
  Attempting uninstall: requests
    Found existing installation: requests 2.32.3
    Uninstalling requests-2.32.3:
      Successfully uninstalled requests-2.32.3
  Attempting uninstall: python-dateutil
    Found existing installation: python-dateutil 2.9.0.post0
    Uninstalling python-dateutil-2.9.0.post0:
      Successfully uninstalled python-dateutil-2.9.0.post0
  Attempting uninstall: multiprocess
    Found existing installation: multiprocess 0.70.15
    Uninstalling multiprocess-0.70.15:
      Successfully uninstalled multiprocess-0.70.15
  Attempting uninstall: aiosignal
    Found existing installation: aiosignal 1.3.2
    Uninstalling aiosignal-1.3.2:
      Successfully uninstalled aiosignal-1.3.2
  Attempting uninstall: pandas
    Found existing installation: pandas 2.2.2
    Uninstalling pandas-2.2.2:
      Successfully uninstalled pandas-2.2.2
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.33.0
    Uninstalling huggingface-hub-0.33.0:
      Successfully uninstalled huggingface-hub-0.33.0
  Attempting uninstall: aiohttp
    Found existing installation: aiohttp 3.11.15
    Uninstalling aiohttp-3.11.15:
      Successfully uninstalled aiohttp-3.11.15
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.3.0 which is incompatible.
google-colab 1.0.0 requires requests==2.32.3, but you have requests 2.32.4 which is incompatible.
tsfresh 0.21.0 requires scipy>=1.14.0; python_version >= "3.10", but you have scipy 1.13.1 which is incompatible.
pylibcudf-cu12 25.2.1 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 20.0.0 which is incompatible.
dask-cudf-cu12 25.2.2 requires pandas<2.2.4dev0,>=2.0, but you have pandas 2.3.0 which is incompatible.
gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.3.0 which is incompatible.
langchain-core 0.3.65 requires packaging<25,>=23.2, but you have packaging 25.0 which is incompatible.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
cudf-cu12 25.2.1 requires pandas<2.2.4dev0,>=2.0, but you have pandas 2.3.0 which is incompatible.
cudf-cu12 25.2.1 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 20.0.0 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 12.5.82 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-nvrtc-cu12 12.5.82 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.5.82 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.3.0.75 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cufft-cu12==11.2.1.3; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cufft-cu12 11.2.3.61 which is incompatible.
torch 2.6.0+cu124 requires nvidia-curand-cu12==10.3.5.147; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-curand-cu12 10.3.6.82 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cusolver-cu12==11.6.1.9; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusolver-cu12 11.6.3.83 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cusparse-cu12==12.3.1.170; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusparse-cu12 12.5.1.3 which is incompatible.
torch 2.6.0+cu124 requires nvidia-nvjitlink-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-nvjitlink-cu12 12.5.82 which is incompatible.
Successfully installed aiohappyeyeballs-2.6.1 aiohttp-3.12.13 aiosignal-1.3.2 attrs-25.3.0 certifi-2025.6.15 charset_normalizer-3.4.2 datasets-3.6.0 dill-0.3.8 evaluate-0.4.4 filelock-3.18.0 frozenlist-1.7.0 fsspec-2025.3.0 gensim-4.3.3 hf-xet-1.1.5 huggingface-hub-0.33.0 idna-3.10 multidict-6.5.0 multiprocess-0.70.16 numpy-1.26.4 packaging-25.0 pandas-2.3.0 propcache-0.3.2 pyarrow-20.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.2 requests-2.32.4 scipy-1.13.1 six-1.17.0 smart-open-7.1.0 tqdm-4.67.1 typing-extensions-4.14.0 tzdata-2025.2 urllib3-2.5.0 wrapt-1.17.2 xxhash-3.5.0 yarl-1.20.1


import pandas as pd
from datasets import load_dataset
import nltk
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import gensim.downloader as api
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import pad_sequences, to_categorical
from tensorflow.keras.layers import Embedding, LSTM, Bidirectional, Dense
from tensorflow.keras import Sequential
from tensorflow.keras.metrics import AUC, SparseCategoricalAccuracy
from tensorflow.keras.callbacks import EarlyStopping
import numpy as np
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, DataCollatorWithPadding, create_optimizer,pipeline
from datasets import Dataset
import evaluate
from transformers.keras_callbacks import KerasMetricCallback
import gc
from scipy.special import softmax


nltk.download("stopwords")
nltk.download("punkt_tab")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.

True


dataset = load_dataset("SetFit/sst5")

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

README.md:   0%|          | 0.00/421 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.

train.jsonl:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

dev.jsonl:   0%|          | 0.00/171k [00:00<?, ?B/s]

test.jsonl:   0%|          | 0.00/343k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8544 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1101 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2210 [00:00<?, ? examples/s]


train = pd.DataFrame(dataset["train"])
test = pd.DataFrame(dataset["test"])
val = pd.DataFrame(dataset["validation"])


def initial_analysis(data):

  print(f"************1st five in the dataset************ \n {data.head()}")
  print(f"************Summary Stat************ \n {data.describe()}")
  print(f"************Count of missing Values************\n {data.isnull().sum()}")
  print(f"************Dataset shape************\n {data.shape}")
  print(f"************Duplicated rows count************ \n {data.duplicated().sum()}")
  print(f'************Unqiue label count************ \n {round(data["label_text"].value_counts(normalize=True)*100, 2)}')


initial_analysis(train)

************1st five in the dataset************ 
                                                 text  label     label_text
0  a stirring , funny and finally transporting re...      4  very positive
1  apparently reassembled from the cutting-room f...      1       negative
2  they presume their audience wo n't sit still f...      1       negative
3  the entire movie is filled with deja vu moments .      2        neutral
4  this is a visually stunning rumination on love...      3       positive
************Summary Stat************ 
              label
count  8544.000000
mean      2.058052
std       1.281570
min       0.000000
25%       1.000000
50%       2.000000
75%       3.000000
max       4.000000
************Count of missing Values************
 text          0
label         0
label_text    0
dtype: int64
************Dataset shape************
 (8544, 3)
************Duplicated rows count************ 
 10
************Unqiue label count************ 
 label_text
positive         27.18
negative         25.96
neutral          19.01
very positive    15.07
very negative    12.78
Name: proportion, dtype: float64


def nlp_initial_analysis(data):

  stop_words = set(stopwords.words("english"))
  stop_words.update(["movie", "film", "rrb", "lrb"])

  freqdist = " ".join(r.lower() for text in data["text"] for r in word_tokenize(text) if r not in stop_words and r.isalnum())
  freqdist = FreqDist(freqdist.split())
  top10_words = freqdist.most_common(10)
  word, count = zip(*top10_words)
  print(f"Total number of unique words are {len(freqdist)}, and total number of words are {sum(freqdist.values())}")

  plt.figure(figsize=(12,8))
  plt.bar(word, count)
  plt.title("Top 10 most frequent words")
  plt.show()


  reviews_label = list(data["label_text"].value_counts().index)
  for i in reviews_label:

    review_type = data[data["label_text"] == i]
    plt.figure(figsize=(12,8))

    text = " ".join(reviews.lower() for reviews in review_type["text"])
    wordcloud = WordCloud(stopwords=stop_words).generate(text)
    plt.title(f"WordCloud for {i}")
    plt.imshow(wordcloud)
    plt.show()



  plt.pie(data["label_text"].value_counts(normalize = True), labels= data["label_text"].value_counts(normalize = True).index, autopct= "%1.1f%%")
  plt.title("Classification percentage")
  plt.show()


nlp_initial_analysis(train)

Total number of unique words are 14703, and total number of words are 74949


google_news_model = api.load("word2vec-google-news-300")

[==================================================] 100.0% 1662.8/1662.8MB downloaded


def remove_stopwords(data):

  stop_words = set(stopwords.words("english"))
  text = " ".join(review  for review in word_tokenize(data) if review not in stop_words and review.isalnum())
  return text


train["text"] = train["text"].apply(remove_stopwords)
test["text"] = test["text"].apply(remove_stopwords)
val["text"] = val["text"].apply(remove_stopwords)


def vocab_len_size(data, coverage_threshold):

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(data)
  vocab_size = len(tokenizer.word_counts)
  sequences = tokenizer.texts_to_sequences(data)

  seq_length = sorted([len(seq) for seq in sequences])
  max_len = np.percentile(seq_length, coverage_threshold)

  return int(max_len), vocab_size


max_len, vocab_size = vocab_len_size(train["text"], 98)
oov_token = "<OOV>"
pad_type = "post"
trunc_type = "post"
embedding_size = 300


tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_token)
tokenizer.fit_on_texts(train["text"])


def texts_to_sequence(data):
  text_to_sequence = tokenizer.texts_to_sequences(data)
  padded = pad_sequences(text_to_sequence, maxlen= max_len, padding=pad_type, truncating=trunc_type)

  return padded


train_padded = texts_to_sequence(train["text"])
test_padded = texts_to_sequence(test["text"])
val_padded = texts_to_sequence(val["text"])


def embedding_vector():

  embedding_metrix = np.zeros((vocab_size, embedding_size))

  for word, i in tokenizer.word_index.items():

    if i < vocab_size:

      try:

        embedding_metrix[i] = google_news_model[word]

      except KeyError:
        pass

  return embedding_metrix


pretrained_vector = embedding_vector()


y_train_encoded = to_categorical(train["label"], num_classes =5)
y_val_encoded = to_categorical(val["label"], num_classes =5)
y_test_encoded = to_categorical(test["label"], num_classes =5)


model1 = Sequential()
model1.add(Embedding(input_dim = vocab_size, output_dim= embedding_size, weights= [pretrained_vector], trainable = False))
model1.add(Bidirectional(LSTM(128, return_sequences=True)))
model1.add(Bidirectional(LSTM(128, return_sequences=True)))
model1.add(Bidirectional(LSTM(128)))
model1.add(Dense(5, activation="softmax"))
model1.compile(loss = "categorical_crossentropy", optimizer= "adam", metrics=[AUC(multi_label=True, name= "val_auc")])
model1.summary()

Model: "sequential_1"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ embedding_1 (Embedding)         │ ?                      │     4,411,500 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ bidirectional_3 (Bidirectional) │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ bidirectional_4 (Bidirectional) │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ bidirectional_5 (Bidirectional) │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ ?                      │   0 (unbuilt) │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 4,411,500 (16.83 MB)

 Trainable params: 0 (0.00 B)

 Non-trainable params: 4,411,500 (16.83 MB)


earlystopping = EarlyStopping(monitor="val_val_auc", mode="max", restore_best_weights=True, patience=5)
model1.fit(train_padded, y_train_encoded, validation_data=(val_padded, y_val_encoded), epochs=10, batch_size= 64, callbacks=[earlystopping])

Epoch 1/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 21s 104ms/step - loss: 1.4480 - val_auc: 0.6660 - val_loss: 1.3473 - val_val_auc: 0.7362
Epoch 2/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 94ms/step - loss: 1.2894 - val_auc: 0.7553 - val_loss: 1.3221 - val_val_auc: 0.7487
Epoch 3/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 94ms/step - loss: 1.2300 - val_auc: 0.7798 - val_loss: 1.3366 - val_val_auc: 0.7514
Epoch 4/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 94ms/step - loss: 1.2085 - val_auc: 0.7913 - val_loss: 1.3198 - val_val_auc: 0.7553
Epoch 5/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 95ms/step - loss: 1.1381 - val_auc: 0.8163 - val_loss: 1.3828 - val_val_auc: 0.7404
Epoch 6/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 95ms/step - loss: 1.0990 - val_auc: 0.8307 - val_loss: 1.3838 - val_val_auc: 0.7428
Epoch 7/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 95ms/step - loss: 1.0170 - val_auc: 0.8594 - val_loss: 1.4620 - val_val_auc: 0.7297
Epoch 8/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 94ms/step - loss: 0.9312 - val_auc: 0.8813 - val_loss: 1.5760 - val_val_auc: 0.7353
Epoch 9/10
134/134 ━━━━━━━━━━━━━━━━━━━━ 13s 94ms/step - loss: 0.8274 - val_auc: 0.9069 - val_loss: 1.5752 - val_val_auc: 0.7239

<keras.src.callbacks.history.History at 0x7a082c5db550>


model1.evaluate(test_padded, y_test_encoded)

70/70 ━━━━━━━━━━━━━━━━━━━━ 2s 26ms/step - loss: 1.2888 - val_auc: 0.7587

[1.2863892316818237, 0.7601470947265625]


tokenizer_dist = AutoTokenizer.from_pretrained("distilbert-base-uncased")


def tokenize_function(data):
    return tokenizer_dist(data["text"], truncation=True)


tokenized_dataset = dataset.map(tokenize_function, batched=True)


data_collator = DataCollatorWithPadding(tokenizer= tokenizer_dist, return_tensors="tf")


roc_auc = evaluate.load("roc_auc", "multiclass")


def compute_metrics(eval_pred):

  preds, label = eval_pred
  probs = softmax(preds, axis=1)
  return roc_auc.compute(prediction_scores=probs, references=label, average="macro", multi_class="ovr")


label_1 = train[["label","label_text"]].drop_duplicates().set_index("label")["label_text"].to_dict()
label_2 = train[["label_text", "label"]].drop_duplicates().set_index("label_text")["label"].to_dict()


tf.keras.backend.clear_session()
gc.collect()
model2 = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased", num_labels=5, id2label=label_1, label2id=label_2)

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


batch_size = 16
num_epochs = 5
batches_per_epoch = len(tokenized_dataset["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer(init_lr= 3e-5, num_warmup_steps=0, num_train_steps=total_train_steps)


tf_train_set = model2.prepare_tf_dataset(tokenized_dataset["train"], shuffle=True, batch_size=16, collate_fn=data_collator)
tf_validation_set = model2.prepare_tf_dataset(tokenized_dataset["validation"], shuffle=False, batch_size=16, collate_fn=data_collator)
tf_test_set = model2.prepare_tf_dataset(tokenized_dataset["test"], shuffle=False, batch_size=16, collate_fn=data_collator)


model2.compile(optimizer=optimizer)


metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
callbacks = [metric_callback]


model2.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=5, callbacks=callbacks)

Epoch 1/5
534/534 [==============================] - 79s 116ms/step - loss: 1.2494 - val_loss: 1.1887 - roc_auc: 0.8105
Epoch 2/5
534/534 [==============================] - 50s 93ms/step - loss: 0.9486 - val_loss: 1.1901 - roc_auc: 0.8171
Epoch 3/5
534/534 [==============================] - 49s 93ms/step - loss: 0.6869 - val_loss: 1.3317 - roc_auc: 0.8154
Epoch 4/5
534/534 [==============================] - 50s 93ms/step - loss: 0.4683 - val_loss: 1.5756 - roc_auc: 0.8113
Epoch 5/5
534/534 [==============================] - 49s 92ms/step - loss: 0.3173 - val_loss: 1.6810 - roc_auc: 0.8088

<tf_keras.src.callbacks.History at 0x7a33326f82d0>


test_inputs_only = tf_test_set.map(lambda x, y: x)
test_label_only = tf_test_set.map(lambda x, y: y)
batches = [y.numpy() for y in test_label_only]
y_true = np.concatenate(batches, axis=0)


result = softmax(model2.predict(test_inputs_only)["logits"], axis=1)

139/139 [==============================] - 4s 29ms/step


roc_auc.compute(prediction_scores = result, references=y_true, average="macro", multi_class="ovr")

{'roc_auc': 0.8277722980808679}


pipe = pipeline("text-classification", model=model2, tokenizer=tokenizer_dist)

Device set to use 0


print("*******************Test Run******************************")
print(pipe(test["text"][0]))
print(f'Actual text: {test["text"][0]}, Actual Class: {test["label_text"][0]}')

*******************Test Run******************************
[{'label': 'negative', 'score': 0.8883830904960632}]
Actual text: no movement , no yuks , not much of anything ., Actual Class: negative

Column	Missing Count
`text`	0
`label`	0
`label_text`	0

Label Text	Proportion (%)
Positive	27.18%
Negative	25.96%
Neutral	19.01%
Very Positive	15.07%
Very Negative	12.78%

Feature	Model 1: Word2Vec	Model 2: DistilBERT
Embedding Type	Static pretrained Word2Vec	Transformer-based DistilBERT (contextual)
Trainable Embeddings	❌ No (frozen)	✅ Yes (fully fine-tuned)
Model Architecture	Embedding → BiLSTM (x3) → Dense	DistilBERT → Dense
Trainable Params	0 (all frozen)	Fully trainable
Total Params	~4.4M (non-trainable)	Likely >60M

Metric	Model 1: Word2Vec	Model 2: DistilBERT
Best Val AUC	0.7553 (Epoch 4)	0.8171 (Epoch 2)
Final Test AUC	0.7601	0.8278 (macro, full set)
Overfitting Point	After Epoch 4	After Epoch 2
Val Loss Trend	Increases after Epoch 4	Increases after Epoch 2
Train Loss Trend	Smooth decrease	Rapid decrease
Training Time	~13s per epoch	~49–79s per epoch

Table of Content¶

Lib¶

Initial Data Analysis¶

📝 Dataset Overview and Summary¶

Count of Missing Values ¶

Dataset Shape ¶

Duplicated Rows Count ¶

Unique Label Count ¶

WordCloud Sentiment Analysis¶

🔹 Neutral Sentiment¶

🔻 Negative Sentiment¶

🔺 Positive Sentiment¶

🔻 Very Negative Sentiment¶

🔺 Very Positive Sentiment¶

📌 Summary¶

Multi-use functions for Word2Vec¶

Word2Vec Pretrained Vector¶

📈 Observations¶

LLM (DistilBERT) Pretrained Model¶

📈 Observations¶

Model Comparison: Word2Vec vs. DistilBERT¶

🤖 Model Comparison: Word2Vec vs. DistilBERT¶

🧱 Model Architecture & Embeddings¶

📊 Training & Validation Performance¶

📈 Performance Summary¶

✅ Thoughts¶

Table of Content¶

Lib¶

Initial Data Analysis¶

📝 Dataset Overview and Summary¶

** Count of Missing Values **¶

** Dataset Shape **¶

** Duplicated Rows Count **¶

** Unique Label Count **¶

WordCloud Sentiment Analysis¶

🔹 Neutral Sentiment¶

🔻 Negative Sentiment¶

🔺 Positive Sentiment¶

🔻 Very Negative Sentiment¶

🔺 Very Positive Sentiment¶

📌 Summary¶

Multi-use functions for Word2Vec¶

Word2Vec Pretrained Vector¶

📈 Observations¶

LLM (DistilBERT) Pretrained Model¶

📈 Observations¶

Model Comparison: Word2Vec vs. DistilBERT¶

🤖 Model Comparison: Word2Vec vs. DistilBERT¶

🧱 Model Architecture & Embeddings¶

📊 Training & Validation Performance¶

📈 Performance Summary¶

✅ Thoughts¶

Count of Missing Values ¶

Dataset Shape ¶

Duplicated Rows Count ¶

Unique Label Count ¶