Introduction

When I embarked on the AI Insights project, I aimed to create a robust, scalable solution to measure public sentiment towards AI topics using Twitter data. The project leverages a microservices architecture, ensuring that each component is modular, maintainable, and can be independently scaled. This approach not only simplifies development and deployment but also enhances the project's resilience and flexibility.

Project Overview

AI Insights is a comprehensive sentiment analysis platform that collects, processes, and visualizes public sentiment towards AI-related topics. The project consists of three primary microservices:

  1. Data Collector Service: This service interfaces with the Twitter API to pull in recent tweets related to AI. The collected tweets are then sent to a RabbitMQ message queue.
  2. Data Analyzer Service: This service consumes tweets from the RabbitMQ queue and uses Google Cloud's Natural Language Processing (NLP) service to analyze the sentiment of each tweet. The analyzed data, including sentiment scores and magnitudes, is then persisted to a PostgreSQL database.
  3. Web Service: A Flask application that provides the backend API and serves static content. This service utilizes Flask, SQLAlchemy, Flask-Smorest, and Marshmallow for API management and data validation. D3.js is used to visualize the sentiment analysis results.

Technologies Used

  • Python: The primary programming language for all microservices.
  • Docker: Containerization technology to package the microservices.
  • Docker Compose: For container orchestration.
  • RabbitMQ: Message queue for decoupling the data collector and data analyzer services.
  • Google Cloud NLP: For sentiment analysis.
  • PostgreSQL: Database for persisting analyzed data.
  • Flask: Web framework for the web service.
  • SQLAlchemy: ORM for database interactions.
  • Flask-Smorest: For building RESTful APIs.
  • Marshmallow: For data validation.
  • D3.js: For data visualization.
  • GitHub Actions: For CI/CD to deploy the application to an AWS EC2 Ubuntu instance.

Microservices Architecture

Data Collector Service

The Data Collector Service is responsible for interfacing with the Twitter API to collect tweets related to AI. It processes the tweets and sends them to a RabbitMQ message queue for further analysis. This decoupling allows the data collection and analysis processes to scale independently, enhancing the system's overall flexibility and robustness.

Data Analyzer Service

The Data Analyzer Service consumes tweets from the RabbitMQ queue and analyzes their sentiment using Google Cloud's Natural Language Processing service. The service calculates sentiment scores and magnitudes, categorizing tweets as positive, negative, or neutral. The analyzed tweets are then stored in a PostgreSQL database for further querying and visualization.

Web Service

The Web Service is a Flask application that provides a RESTful API for accessing the analyzed sentiment data. It also serves static content, including data visualizations created with D3.js. This service utilizes Flask-Smorest and Marshmallow for efficient API management and data validation, ensuring that the data served to clients is accurate and reliable.

CI/CD and Deployment

The entire application is containerized using Docker, with Docker Compose handling the orchestration of multiple containers. GitHub Actions is used to implement a continuous integration and continuous deployment (CI/CD) pipeline. The pipeline ensures that any changes pushed to the repository are automatically built, tested, and deployed to an AWS EC2 Ubuntu instance. This automated process minimizes downtime and ensures that the application is always up-to-date with the latest features and fixes.

Highlights

Microservices Architecture

The microservices architecture of AI Insights ensures that each component is modular and independently scalable. This design pattern enhances the maintainability and flexibility of the system, allowing for easier updates and scaling.

Message Queue

The use of RabbitMQ as a message queue decouples the data collection and analysis processes, allowing them to operate independently and asynchronously. This decoupling improves the system's resilience and scalability, as each component can be scaled according to its specific load.

CI/CD and Containerization

The application is fully containerized using Docker, ensuring consistency across different environments. Docker Compose simplifies the orchestration of multiple containers, while GitHub Actions automates the CI/CD pipeline, providing a seamless deployment process. This setup not only improves efficiency but also enhances the reliability and stability of the application.

Lessons Learned

I wanted to build an application using modern, big data software architecture patterns. I wanted to explore how to implement, on a small scale, a containerized app with independent microservices that would all function together. My ultimate career goal at this point is to transition to DevOps, and modern enterprise applications are composed of many small services linked together. While the scope of this project is small, the skills I learned in dividing up each service, containerizing it, orchestrating the containers with docker compose, and deploying it all to the cloud via CICD principles really taught me a lot.

Where do I go from here?

I really like this app and want to continue working on it. The largest downside is that the twitter API is prohibitively expensive. I paid $100 for one month of access to the API and that's the lowest available tier. Still, I'm not done with this project. I'd like to make some of the following changes and updates in the future:

  • Adjust the twitter search query to filter out a lot of the garbage tweets that are irrelevant to the goal of analyzing public sentiment toward AI. A lot of the tweets are ads or sales opportunities. Some tweets mention AI tangentally but are not wholly AI focused. I think I'll explore a more context-aware sentiment analysis tool later on. As my master's degree at CU Boulder continues and I take more AI, LLM, and data science courses, I should be able to train a model from scratch for this explicit purpose.

  • Add other data sources other than twitter, to get a more holistic view of public opinion. This is why the microservices architecture is important. Because the analyzer service is a separate, independent component, it is source agnostic. It simply ingests the MQ and does something with the data.

  • Make the app more reseliant to large data volumes. At the moment, the app is fine in the small scale, but if we start taking in larger data volumes, we won't be able to ingest all of the analyzed messages directly from the DB and summarize the analytics on the client side anymore. We will probably need to think about adding the ability to summarize data at the DB level and then expose the summaries through the API.

  • Explore Kubernetes. Docker compose was great, but one downside that I ran into was that the dependency feature in docker compose can tell you when a container is up, but NOT if it is ready to accept connections. Since several services depend on either the MQ or the DB being fully initialized, this can lead to services failing to start because their dependencies are not accepting connections. I got around this by creating a python script to wait for the services to be available and executing that before attempting to start the services. I think Kubernetes would solve this issue (and probably have a lot more benefits, plus I need to learn it anyway.)

Conclusion

The AI Insights project showcases the power of a microservices architecture combined with modern DevOps practices. By leveraging Docker, RabbitMQ, Google Cloud NLP, and Flask, we created a scalable and maintainable sentiment analysis platform. The CI/CD pipeline implemented with GitHub Actions ensures that the application is always up-to-date and running smoothly. You can explore the live demo here and the source code on GitHub here. Stay tuned for more updates and features as the project continues to evolve!