This project involved web scraping, data cleaning, and analysis using Python. I created a dummy website to practice ethical scraping techniques, avoiding the use of random sites without permission. The project focused on extracting and analyzing data from this custom site.
The process began with web scraping articles and tables from my dummy site using BeautifulSoup. The scraped data was cleaned using Pandas, including transforming date and numeric columns. I faced challenges with inconsistent date formats and numeric data, and converting Jupyter Notebooks to PDFs. Guidance from the Twitter community helped resolve these issues.
Key challenges included handling inconsistent date formats and cleaning numeric columns. Additionally, converting Jupyter Notebooks to PDFs was problematic. Solutions involved adjusting data formats in Pandas and leveraging Twitter support for technical advice on PDF conversion.
This project enhanced my skills in web scraping, data cleaning with Pandas, and troubleshooting Jupyter Notebook conversions. I gained valuable insights from the community and improved my ability to manage and analyze web data effectively.