Monsef Rachid
Home About Projects Contact

Digital Media Research Tool

Digital Media Research Tool for the Digital Media Alliance of Florida

Project Overview

This web application was built as a research tool for the Digital Media Alliance of Florida, designed to collect and classify data on digital media companies across the state. The project involved developing a web scraper to extract company information from public records, structuring the data in a database organized by county, and importing NAICS codes for classification.

To enhance accuracy, I implemented a variant of Latent Semantic Indexing (LSI) to analyze company names, descriptions, and other metadata, allowing the system to automatically determine the business category of each company. The platform features an intuitive web interface that enables researchers to review, verify, and correct classifications as needed. Additionally, I integrated Mapbox to provide a visual representation of digital media companies across Florida.

This project showcases my ability to build end-to-end data-driven applications, combining web scraping, data processing, machine learning-based classification, and interactive visualization to support research and decision-making.

Tools Used

PHP MySQL TCPDF MapBox

Project Details

  • Requirement Analysis: Gathered specifications from the Digital Media Alliance of Florida to define project goals and data requirements.
  • Data Collection: Developed a web scraper to extract company information from public records and structured the data into a relational database.
  • Data Organization: Categorized companies by county and imported NAICS codes for industry classification.
  • Automated Classification: Implemented a variant of Latent Semantic Indexing (LSI) to classify companies based on names, descriptions, and other metadata.
  • Database Design: Designed a scalable database schema to store company records, classification results, and manual adjustments by researchers.
  • Web Interface Development: Built an intuitive web-based dashboard for researchers to review and refine classifications.
  • Data Visualization: Integrated Mapbox to plot digital media companies across Florida for spatial analysis.
  • Search and Filtering: Implemented advanced search and filtering capabilities to allow researchers to easily navigate the dataset.
  • User Access and Permissions: Developed user authentication and role-based access control for secure data management.
  • Performance Optimization: Optimized database queries and caching mechanisms for fast data retrieval and processing.
  • Data Integrity and Validation: Implemented validation rules to ensure accurate and clean data ingestion.
  • Deployment: Deployed the application to a production environment with security best practices in place.
  • Maintenance & Iteration: Provided ongoing improvements based on researcher feedback, refining classification algorithms and user experience.