[Life] 2024 TSMC IT CareerHack Campus Hackathon

Participation

I was invited by Chen-Zhong to participate in TSMC’s 2024 Campus Hackathon. Riding on the momentum from our good results at the Mei-Chu Hackathon, I wanted to see if we could win a second award, so I signed up for this competition.

When we were looking at the themes, since He-Kun and I had already read many papers and implementations related to CV (Computer Vision) in our reading group, we decided to choose the “AI Image Storytelling” theme.

Preliminary Round

To actually compete in the finals at their Taipei office on 1/26 and 1/27, we had to participate in the online preliminary round. Basically, it was testing programming ability, with languages like C++, Python, Java to choose from. The problems were relatively simple - mine were just greedy algorithms and some very basic problems. I heard from friends that there were also tree-related problems. But it was really quite basic, reminds me of the freshman programming late-night sessions.

Anyway, I personally feel the preliminaries and finals had little connection. I found it quite surprising they used this mechanism for screening. (I heard there were over 100 teams registered, and they eventually selected 24 teams)

Finals

After that, we were notified we made it to the finals, along with information about the problem and hackathon schedule. We started preparing.

Problem: AI Image Storytelling

The organizers provided a dataset containing images with safety helmets and GPT conversation results, plus a LLaVA model pretrained with LoRA. Our task was to use this information to create a system that could process images and answer these questions:

1
2
3
4
5
6
7
8
9
{
"Q1": "Is there any people not wearing a helmets? Please answer with true or false.",
"Q2": "How many people are there in the image? Please give a single integer answer.",
"Q3": "How many people are there and wear a helmets in the image? Please give a single integer answer.",
"Q4": "How many people are there and not wear a helmets in the image? Please give a single integer answer.",
"Q5": "Tell the color of the helmets that people wear in the image from this list: red, yellow, blue, white, green, orange, black.The answer may be more than one color.If there is no one wear a helmets, please answer None as response. ",
"Q6": "What is the warning message shown in the image? Just give me the warning sentence.",
"Q7": "Is there anyone violating the warning message describe in the image? Please answer with true or false."
}

Finally, scoring would be done using a private test set provided by the organizers. 70% of the final score would be based on the correctness of the above outputs. Only 30% would be determined by the on-site demo performance.

This scoring mechanism was completely different from the Mei-Chu Hackathon I participated in before, and there were some issues I found quite puzzling:
The private set only had 107 images, and they gave us nearly 1.5 hours to generate results. If someone wanted to, manually annotating all the answers wouldn’t be impossible. How could they ensure no team manually filled in all the answers? Although people were monitoring and warning us not to cheat, I still found it somewhat unreasonable.

Problems Encountered on GCP (Google Cloud Platform)

This time they also had Google Cloud provide computing resources for us to use, but our experience using it wasn’t very smooth.

When we tried to finetune instruct-BLIP, we once crashed the RAM, causing the entire instance to break. Before the competition, our instance also broke for unknown reasons, and we received an email asking if we did something to break it, but we had no idea what happened - we didn’t do any special operations.

For example, the instance they provided broke twice:

  • The first time was 2 days after the hackathon when I suddenly received an email from the organizers reminding us that our instance crashed, asking if we ran any commands. But we didn’t do any special operations, so we still don’t understand what happened to the machine.
  • The second time was the day before the competition when we tried to load instruct-BLIP into the instance to finetune, apparently filling up the disk, and the entire instance crashed.

Similarly, besides the instance, there were also problems with the bucket they provided:
Before and during the competition, both He-Kun and I encountered permission errors when trying to upload large files from the instance to the bucket. It wasn’t until the Google Cloud team came to provide support that day that they discovered the problem was because it would request an additional “get” command from the bucket, and our instance initially didn’t seem to have this permission, causing this issue.

However, I must mention that the staff were really very proactive. Basically, they addressed problems immediately. When fixing the bucket issue, I really felt their enthusiasm, which also caused me considerable pressure (the pressure of having 6-7 people surrounding you is no joke…). Anyway, I’m really grateful for their help!

Results

Although we encountered many situations during the competition: like my long-lasting gastroenteritis, getting rejected for Google’s internship application, plus losing one team member on the second day (he went to Japan😢), which more or less affected our performance these two days, so what we accomplished was far below my expectations. But I think the more important reason we didn’t perform well was actually “we started too late” and “misunderstood the problem.”

Started Too Late

After the workshop ended, we basically only did simple paper reviews and tested three architectures: LLaVA, BLIP, and Flamingo. We discovered very late that the resources the organizers provided weren’t enough for BLIP and Flamingo to use, so a lot of time was spent on things that didn’t contribute much to the final result. Additionally, because of my health during those days, I only did simple data analysis and didn’t at least run through their sample code to finetune LLaVA - this wasn’t done until competition day, so I think we really started working too late. With the two days of competition, what we could produce was probably just this much.

Misunderstood the Problem

When we first saw the problem, we thought “do the first few questions really need generative models?” The required questions could actually be solved well with CV models we learned, so why would we need generative models (of course for questions like Q7 that ask about logic, we also thought they were necessary)? So we wrote to the organizers asking if we could use computer vision models to answer, but their response was “we don’t recommend using specialized models to answer one by one, because questions like #7 require more general capabilities.” Although we felt they didn’t answer our question, we understood it as they “didn’t want” us to use non-generative models. Then on competition day, we saw many teams combining CV model outputs with questions into new prompts for LLMs to generate results. Besides admiring other teams’ creativity, we also regretted misunderstanding the organizers’ intention and not thinking in the direction of combining image recognition with LLMs. This was the most regrettable part of this competition.

Reflections

This competition felt regrettable because we didn’t give our full effort. Although we received praise from judges: “What a pity! You were on the border (4th place)! And you’re only juniors, so young, the other teams are all graduate students!” - and I think it was probably consolation, but hearing it still made me feel better, which is magical. However, this experience was perfect for me to explore the entire LLM architecture and have the opportunity to actually use deepspeed to finetune a model, which perfectly aligned with the tasks my lab professor would give me later. So I’ve now changed my mindset and think that having these gains from this competition is already quite sufficient. The rest is left for my future self to work on. I hope my future self looking back at participating in this competition will feel that I’ve grown a lot!


[Life] 2024 TSMC IT CareerHack Campus Hackathon
https://torrid-fish.github.io/life-tsmc_2024_hackthon/
Author
Torrid-Fish
Posted on
January 27, 2024
Licensed under