Visual Analytics Science and Technology (VAST) Challenges internship at Sidata+

Khunruksa S.
11 min readAug 11, 2022
Photo by Clay Banks on Unsplash

An internship is one of the most important experiences you can have as an undergraduate. Gaining work experience is critical for increasing your career prospects and understanding real-world working situations, particularly as an IT student. That is why, following my fourth year at KMITL, and after my experience as a software development paradigm, I’m also looking forward to exploring the data science domain after finishing my senior project. I’d want to broaden my knowledge on how to evaluate raw data such that it’s simple to absorb and ready to utilize.

So, I was fortunate to discover the internship opportunity on the Sidata+ website. Mahidol University’s Faculty of Medicine (Siriraj Hospital) has a well-established reputation as a leading institution for medical education and research in Thailand. In February 2020, Siriraj Informatics and Data Innovation Center (SiData+) was founded to raising the bar in medical data and research. They are in responsible for supervising data management, data governance, and data innovation in order to transform the organization and provide new medical services.

In this year, there are a number of internship opportunities available. But the most interesting topic based on my experiences, skills and interest are the Visual Analytics Science and Technology (VAST) Challenges internship. This role is in charge of Visual Analytics in international competitions from May to July every year. And I decided to choose this position.

Outline

  • Internship experiences
  • What is VAST challenge?
  • Result
  • Conclusion

Internship experiences

In covid-19 situation, we mostly doing the internship in WFH. and we also have the morning catch-up meeting on Tuesday and Thursday via Zoom. to sync by describing what we did?, what we doing? and what we going to do next? in progression VAST Challenge. Not only sharing out insight and idea across the VAST challenge, but we also got mentoring from P’Max and P’Pooh with the widely aspect in data visualization surface. we learnt how data coming to help hospital to effective and efficient resource management according the situation such as bed-allocating for patient dashboard to support discussion in Covid-19 outbreak and others internal visualization that help executive board to make decision in policy making to drive organization.

Data Viz at Sidata+

We also got the sharing session from Sidata+ visualization team in DIGI Data Camp 2022 within topic “Dashboard relationship between generation & Work from home compared with satisfaction and engagement in the great resignation crisis”. this session shared insightful process of intensive requirement and data gathering with Human Resource(HR) team. To turning internal data to summary that could lead to recognize the human resource situation in organization during Covid-19. From this session, we learnt how data analyst transform data into valuable result.

First-meet up 🙌🏻

And during the internship, our team decide to first-time meetup in our intern period at Sidata+. we got some lecture from P’Pooh in Introduction in Data Science in the morning. we learn a basic datatypes that usually see in data such as categorical data (Qualitative data), numerical data (Quantitative data), measurement scales (define and categorized variables/numbers, each measurements has certain properties which in turn determines the appropriateness for use like nominal scale, ordinal scale, interval scale and ratio scale). and also learning how to achieve goal in data science by 6 data science process. these are a fews detail i note from session.

  1. Setting the research goal : The ways to clarify the propose of research, what exactly goal we want to achieve. and planning a milestone to its. The outcome should be a clear research goal, a good understanding of project
  2. Retrieving data : collect data by ensure that you can use the data, which means checking the existences of the data, quality of the data and access to the data
  3. Data preparation : make the quality of the data and prepare it for use in data cleansing (remove false values from a data source and inconsistencies across the data) data Integration (enriched data source by combine information from multiple data source) and Data Transformation (ensure the data is suitable format for the data model)
  4. Data exploration: the methods allow us to reduce the amount of information to understand data with data visualization
  5. Data modeling
  6. Presentation and automation : present the result to your business, the result can take many forms, ranging from presentations to research reports (use the insights you gained in another project, enable an operational process to use the outcome from the model)

And we exchange a lot of information, resulting in a more in-depth discussion on VAST challenge planning. And it breaks the ice to make everyone feel comfortable with the team.

📊 Data visualization concept

Before start to dive down into the data analysis, we usually face to the data ambiguity and complexity are the initial part that we are see in out internship. the context of data is quite narrow down to its data characteristic. Understanding requirements and limitations need to be initiated part that most of projects must handle properly and though-roughly. These part require intensive and consistent to achieve it by researching into the data, discussing with data owner and brain storming with team. It hard to get no pass though without the framework or guideline for this challenge.

Designing visualization concept is coming the help us in this part. understand the domain situation that what is really need and why.

A Nested Model for Visualization Design and Validation [4]

A nested model for the visualization design and validation provides four layers of thinking framework to formulating the data visualization to meet user need. so, we can describe into 4 hierarchical steps.

  • First, Understanding Domain situation is the initial part that describe the task and data in the vocabulary of the problem domain. if we do in wrong problem, it will lead to doing wrong thing that user do not need it. the audience do not in fact have these problems.
  • Second, Data/Task Abstract are the part that dive into operations and data types that need to be clarify such as In data abstraction, we need to answer a few questions carefully to understand the data such as what data need to be address the problem?, where data is available?, what measure and dimension are in the data?. In task abstraction, we need to clarify the activity that would be useful to user who interact with dashboard. what is expected action by audiences? (discovering overall, comparison, classification, forecasting), what the target of audience? (compare magnitude or distribution, ranking, comparing over time, correlation, similarity, flow). If we do in wrong abstraction, it will lead to show a wrong thing to user who using it.
  • Third, Visual encoding/interaction idioms are process of choosing design that able to effective communication to audience. If we do in wrong visual/interaction, It will make the way you show it doesn’t work for to make your visualization good enough to the audience. such as picking a proper color palette to data such as temperature data should differentiate the hot and cool condition and selecting range of time do not return the aggregation result for audience.
Global Temperature Graph (1851–2020) from This Global Temperature Graph Shows Climate Trends (1851–2020) (visualcapitalist.com)
  • Finally, Algorithms is the step that address on the computer science aspect. To assure the computational complexity (time and memory) are acceptable to perform visualization. wrong algorithm will make the implementation code too slow.

The output from a level above is input to the level below, bringing attention to the design challenge that an upstream error inevitably cascades to all downstream levels. this concept will lead us to doing the Right thing and doing it the right way.

Doing The Right Thing And Doing It The Right Way

📊 WHAT IS VAST?

following section gonna be explain about VAST challenge’s this year competition.

IEEE Visualization Conference 2022 (VIS 2022)

The goal of the annual IEEE VIS is to advance the field of scientific visualization, information visualization, and visual analytics [1][2]. As part of the conferences, The VAST Challenge is designed to help researchers understand how their software would be used in a variety of analytic tasks and encourage innovation in data transformations and interactive visualizations through competition. Researchers and software providers often use VAST Challenge data sets, which has ground truth embedded in the data sets, as benchmarks to demonstrate and evaluate the capabilities of their systems.

VAST Challenge 2022

As datasets and tasks released for VAST Challenge 2022 , the synthetic dataset with an injected ground truth was generated. This year’s challenge is aiming at using visual analysis to understand the current state of the fictional city, Engagement, Ohio, and suggest further improvement for this city that benefit everyone. the dataset provide over 1000 Participants allow recording of the places they visit, theirs spending, and theirs purchases over 15 months. So, our tasks are identifying state of city by visualize the data to characterize the city’s traits and its demographic to understand the needs. and use the visual analysis to plan how to improve the city on quality of life and financial aspect [3].

Logical diagram on VAST 2022 dataset.

There are three main main challenges and a grand challenge for extra tasks when the main challenges are completed. First , the Demographics and Relationships challenge object is examining the available data to build a one-page fact sheet on the city’s demographics, neighborhoods, and business base. Second, the Patterns of Life challenge aims to clarify some representative people’s daily routines, characterize travel patterns to detect potential bottlenecks or risks, and investigate how these patterns change across time and seasons. Third, Economic challenges try to demonstrate how businesses grow or shrink, jobs change through time, and living conditions improve or decline. Finally, if we complete all of the previous tasks, we will be able to summarize your assessment of the city and indicate where the city improvement grant should be invested as grand challenge responses.

Result

Our team decide to doing all three challenges. and we assign main task to everyone. and we share our result to contribute across the challenge we found. these is our work that submit to VAST challenges this year.

Challenge 1: Demographics and Relationships

Engagement residents are interested in learning the characteristics of the town and their demographics to comprehend the following situations in the town: age distribution, degree completion, young generations, school availability and affordability, job market, financial landscape, and etc.

The dataset contains demographic information of 1,011 volunteers such as age, interest, dependents, household size, education level and they also provided daily financial, travel, check-in journal and social network connections. To accompany the demographic data, there is a map of buildings in the city. The primary tasks of the visualization include discovering the pattern hidden in the representative, and summarizing the quality of living in this city.

First of all, the Demographic dataset contains categorical and continuous data. We differentiate the color of the plot to make it easy to understand at first glance by simple data visualization techniques like bar charts, pie charts, box plots and numbers. Charts are interactive and have the cross-filtering feature across the dashboard to facilitate exploratory analysis.

You can visit our work at Challenge 1 Tableau Public Dashboard.

Engagement, Ohio : One-fact sheet summary
(left) NodeXL illustrates the social network between participants by interesting groups (right) Red line stand for relationship between participants

Challenge 2: Patterns of Life

Assuming the volunteers are representative of the city’s population, characterize the distinct areas of the city by characterized different parts of this city based on venue types, and participant activities related to the venue around the city. Discover the city’s overview and classify different areas in this city by develop interactive city map with zoning, venue types, and details filters.

You can visit our work at Challenge 2 Tableau Public Dashboard.

City map with auto-generated zoning map (generated by Power tools)
(left) Participant #673 travel route and daily life tracker (right) Participant #819 travel route and daily life tracker

Challenge 3: Economic

Main questions and objective involve analyzing the relationship between participant, employment, employer, and owner. Profit and loss of each business, participant grouping by daily food cost, wage, and money for extra expenses to determine financial health, as well as determining the financial health of the employer, business growth rate, and employee job changes. Using time series, we create a simple line graph to depict expenses over time. and classify people based on their frequency of employment changes.

You can visit our work at Challenge 3 Tableau Public Dashboard.

Available balance & Ratio of expense and wage
(left) Income & Expense type (right) Number of employee & Job changing

Conclusion

After joining the Visual Analytics Science and Technology (VAST) Challenges internship at Sidata+, I’ve acquired a lot of knowledge. This internship provides a broad understanding of data science and visualization through exposure to competitive environments. Complicated solutions may not guarantee problem resolution. Utilizing the proper tools could lead to superior outcomes under well-understanding fundamental knowledge. Experienced mentors are also crucial in directing us down the right direction on internship. P’Max and P’Pooh provide very thoughtful supervision. Thank you towards interns on the team who has made an effort to collaborate. in addition to contributing valuable experiences and incredible outcomes to our internship program. Finally, thanks to Sidata+ for providing internship opportunities for interested people in data analysis and data science to get practical knowledge and engage with the frontier data health science department in Thailand.

Are you interested? Don’t forget to keep connect with Sidata+ to be next batch of this program.

References

[1] IEEE VIS, IEEE Visualization — Wikipedia

[2]IEEE VIS 2022, Welcome to IEEE VIS 2022!

[3] VAST Challenge 2022 (vast-challenge.github.io)

[4] Munzner, Tamara. (2010). A Nested Model for Visualization Design and Validation. Visualization and Computer Graphics, IEEE Transactions on. 15. 921–928. 10.1109/TVCG.2009.111.

--

--

Khunruksa S.

Data Scientist | Experimentalist | ML Craftmanship