Isabel Loci, Verrazzano Class of 2026,
completed major in Computer Science and minor in Mathematics
Somewhere in
the last year, I fell down a rabbit hole of learning more about the most
popular cyber-attacks in history. These attacks dated back decades ago, when
the online world was still fairly new and became more complex as it developed.
I have always been concerned about the safety and privacy of any information I
use online, so it got me thinking: how do people mitigate attacks at such a
high scale? At the time, I was participating in a one-year long data science
boot camp, and we were focusing on building AI-models using machine learning.
The base idea of machine learning is a system that is able to learn from
previously collected data, detect patterns, and then make decisions on its own
without any explicit programming beforehand. This method allows systems to
handle unknown data well, as they can make decisions independently after
training. Neat, right?
Theoretically,
I thought that it should be possible to create a machine learning-based AI
model that is able to distinguish normal network traffic from malicious network
traffic. So, me and a few friends decided to bring this idea to life. Our
project was built in Python and then deployed through HuggingFace. The final
accuracy percentage of the model was 92.94%. Good, but could've been way
better.
When I was
brainstorming ideas with my mentor, Dr. Huo, I mentioned my project. I talked
about how the methodology we chose to implement had a lot room for improvement,
and that I would've liked to go back and start all over again if I knew what
approach I wanted to try next. Then she suggested this simple, but very
brilliant idea to me: What if my capstone was about a deep research on the
topic of anomaly intrusion detection?
It all clicked
then. I had previously read one or two works of people that attempted the same
project, but not much more beyond that. What I didn't realize was just how many
ways data scientists had solved this problem with in the past, not with just
traditional machine learning methods but also deep learning ones. My work
revolved around studying as many research papers as I could, summarizing my
findings, and presenting them in the form of a survey paper.
What I found
hardest (and a little funny) was how when I came across a concept I hadn't
heard of before, four more fundamental concepts were attached to it that I
could absolutely not exclude from the survey paper. I spent hours upon hours
studying these concepts, breaking down complex formulas and comparing results
of different methodologies simply to understand the hidden connections between
them. There is a general step-by-step process that data scientists follow when
building a ML model, and it is the following: sourcing the data to be used for
model training, data preprocessing, selecting the appropriate algorithm, model
training, and finally model evaluation. Data preprocessing is important because
a dataset could have issues such as missing values, duplicate columns, bad
column names and inconsistent feature names, all of which can interfere with
the accuracy of the model. Then the dataset could have too many samples of one
attack and not enough of another attack, needing to be balanced. Like we
mentioned before, ML algorithms are split into two categories, and each
category is split into several unique algorithms. After the model is trained,
it then needs to be evaluated using industry standard performance metrics that
give an estimation on how the model is doing. If the methods are implemented
correctly, the metrics will reflect the true performance of the model. Just be
careful, a score that is too good to be true is often misleading! We strive to
be as close-- but not too close-- to the 100% accuracy score.
This has been
the most extensive research I have gotten to work on throughout my
undergraduate years, and it has sparked a love in me to do more. Already I wish
to delve into more papers written for different fields. I want to study
Literature, Psychology, Physics, the Arts, and so much more. It has changed the
way I interact with the world around me and how I decide to embrace and utilize
new information.
Working on a
thesis might have not been my first idea for completing my capstone, but I am
so glad it's what I chose in the end.
No comments:
Post a Comment