The Secret Lives of Millennial CS Assistant Professors (Part 1)

Arun Kumar
18 min readJan 2, 2022

--

OK, just N=1 Asst Prof. I love avocado toast with a fried egg, caramel lattes, creating/sharing memes, using social media (primarily Twitter and Facebook), taking vacations and traveling, and relishing CS-enabled “creative destruction” of entire industries. There, with my millennial cred now out of the way, let us get down to business. :)

Why bother writing this article? Probably as a personal memoir of a major chapter of my life. Perhaps also to shed light on the life of a new Asst Prof in CS these days. Maybe it will help some new junior faculty or grad students.

I started as an Asst Prof at UC San Diego CSE in Fall 2016. I took on a joint appointment with HDSI from Fall 2019. I got tenured in Summer 2021, five years in. It feels like time just flew by! I am grateful to the many people in my life who helped me reach this milestone. I hope to continue paying it forward. Perhaps this post is also a part of that process.

Tandem paragliding at Torrey Pines over the gorgeous beaches of La Jolla. Yes, that is me in the front. :)

This article has 7 thematic sections: Research, Funding, and Collaborations; Dissemination, Community, and Service; Advisees and Colleagues; Teaching and HDSI; DEI and Outreach; Personal Life and Health; and other overall observations, including things I wish I knew earlier, mistakes in hindsight, and things I am still uncertain of as I look toward my life’s next chapter. This Part 1 covers the first three sections; Part 2 will cover the rest. I specifically call out key “Lessons” from my experiences throughout.

Research, Funding, and Collaborations

I might write a future post on the technical content/evolution of my research but this article will stay at a meta level on my research path. If you are curious to learn more about my research, read this one-pager, listen to this podcast, or watch this talk video. My primary interests are in data management and systems for ML/AI-based analytics. This is at the intersection of 3 traditional areas of CS: DB, systems, and ML/AI. I have worked at this intersection for 11 years now, including my PhD days. But to transition to Asst Prof life, I had to answer 3 key questions to myself:

  1. Problem selection: What do I work on and how to decide?
  2. Funding: How to get my projects funded?
  3. Collaborations: How to decide which ones to pursue?

Problem selection is at the heart of academic freedom. I realized my taste is nicely captured by NSF’s criteria: intellectual merits and broader impacts. The former is about uncovering new universal truths and creating new non-trivial knowledge — mainly deontological. The latter is about advancing the state of practice and the wider society/world — mainly utilitarian. Both are crucial. The former alone may lead one to contrived, artificial, and ultimately useless work. The “ivory tower” pitfall is surprisingly pervasive. But the latter alone may lead one to mundane and boring work that is just development, not research — a waste of academic freedom. Academia is not some outsourced research wing of industry on the cheap. Thus, I strove/strive to balance both criteria within and across my projects.

But the above is still too high level. At a lower level, I asked/ask the following 3 sub questions. This process was not actually so linear — I am likely just retrofitting some latent structure in what I did. :)

1.1) Where is practice/industry likely headed for the foreseeable future?

1.2) Are there open non-trivial research questions in those directions?

1.3) Which questions interest me that I am equipped to answer?

To answer Q1.1, I enjoy conversing with all kinds of data/ML/AI/software practitioners. I have interacted with over 4 dozen over the years across various settings: enterprises, Web companies, domain sciences, etc. I also like reading practitioner surveys by Kaggle, KDNuggets, etc. All this helps me stay on top of my area’s big picture, including scope for impact. Not all academics care about Q1.1 though. Some like to do purely “out there” stuff with a higher risk of their work being irrelevant in practice. But I think it is highly unlikely that academics can impact practice in a big way on their own in today’s “Big Tech era” of CS vs its early decades. Anyway, to each their own — this is the point of academic freedom after all!

Slide from an overview talk about my research “The New DBfication of ML/AI”: https://www.youtube.com/watch?v=I8OUnzgfkWY

To answer Q1.2, I stay away from practitioners. :) Most of them are too busy with their job’s daily grind to ponder generalizable or longer-term questions. With some distance, I weave concrete practical knowledge with more abstract scientific thinking to craft general, interesting, and timely research questions. I know of no magic formula for the abstract part. Just read widely, speak with researchers in other areas/fields (or watch talks), and make interesting technical connections across areas. In terms of number of projects, I wanted a strategic balance of depth and breadth. Some work on just “one big system” but I saw that strategy as too risky due to major upheavals in CS in the mid 2010s (cloud, DL, etc.). Such “one-trick pony” acts also waste academic freedom. So, I pursued a handful of related but complementary projects.

Finally, Q1.3 presented a dilemma: explore vs exploit. I have neither the time nor the technical chops to study all types of questions. So, I prioritized ones that I could make major headway on, at least as an Asst Prof. I skipped many potential projects that did not suit my expertise or taste. I also balanced extensions of my PhD work (easy to exploit for papers) with riskier from-scratch new projects. IMO just continuing one’s PhD topic as Asst Prof fails to show intellectual independence from one’s advisor(s) and wastes academic freedom. I also pursued two highly exploratory projects just because I was curious, got NSF funding (yaas!), and sought intellectual growth in those directions. After all, the academic life is one of continual learning. Just rehashing what one already knows is a recipe for stagnation.

Lesson: Freedom of problem selection is one of the most important differentiators of academic life. Use it prudently. Balance exploratory and exploitative work. Stay aware of the state of practice. Weigh potential for impact.

Coming to Q2 on funding, my primary sources were/are NSF, industry, and NIH. I have no major experiences with other sources. I did speak with a DARPA program officer once but it went nowhere. See this recent post for a summary, statistics, and lessons on my proposal rejects/accepts. Regardless of sources, I like to classify my grants/gifts into 3 categories:

A. I wrote it from scratch myself as sole PI or lead PI in a collaboration.

B. I am a full collaborator on the project and a co-PI.

C. I am a partial collaborator and a co-PI or senior personnel.

My faculty mentors at UCSD CSE told me early on that we do not have “lower bounds” for how much funding an Asst Prof pulls in— we just look for earnest productivity in grant applications. This amazing culture certainly reduced funding-related stress for me. That said, by the time I filed for tenure, I had managed to pull in $1.5mil in category A, $0.5mil in category B, and $4.1mil in category C. Hopefully not too bad for an Asst Prof. :)

Salient statistics on my proposal submissions from my blog post “On Rejections in Academia”: https://thedatadossier.blogspot.com/2021/05/on-rejections-in-academia_2.html

NSF grant writing is a huge pain — I won’t sugarcoat it. America’s obsession with runaway Capitalism (I critique it more in this post) means NSF budgets are kept low to spur massive competition: funding rates are at an abysmal 10% for core and 20% for CAREER. My CAREER got declined twice. In fact, I got a core Small as PI and was added to a large NIH grant as co-PI well before my CAREER got funded! To be fair, my first attempt was not as compelling in hindsight as the latter ones. The reviews did help me sharpen and deepen my pitch. I also failed to appreciate the importance of serving as an NSF panelist early on. I did do so later, and it helped me understand the process better. In spite of its issues, I now think NSF is a gem of American academia — I salute the dedication and vision of their staff and volunteers!

Lesson: Most reviewer feedback, including negative feedback, will be helpful in some way. Get some distance first. Then keep improving with humility. Volunteer to review NSF proposals as soon as you start as Asst Prof.

Finally, coming to Q3 on collaborations, I strove/strive again for a balance. I think it is unwise for an Asst Prof to pursue only collaborative work because it reduces intellectual independence. I was/am also ruthlessly strategic in choosing my collaborations. If a proposed collaboration is an application or extension of my existing work, I’d typically say yes. This was the case for my NIH grant on a behavioral health project. Their TB-scale data and workloads inspired new DL systems ideas in Project Cerebro. That ultimately led to my CAREER grant for which my collaborator also gave a support letter. This is my most successful collaboration so far: it advanced both core CS and public health research, led to open source artifacts used by domain scientists, and led to transfer of research to products (more on this soon).

Slide from an overview talk on Project Cerebro: https://www.youtube.com/watch?v=ERKyvLT4Wik

For other potential collaborations, I have 4 criteria. Is the topic exciting and likely to be impactful/visible? Is my expertise relevant and complementary? Are the collaborators enjoyable to work with? Will there be requisite time and funding? If any of these are not satisfied, I’d say no. This approach has worked well for me. I’d also advise CS Asst Profs in ML/AI or DB areas in particular to be extra careful in choosing collaborations. ML/AI and DB skills tend to be in high demand among domain scientists, making it easy to get sucked into boring (from a CS standpoint) collaborations.

Ultimately, not all of my research projects were/are successful by utilitarian impact metrics. So, I kill projects using a Darwinian mechanism. It saddens me when a project fails to pan out. But I am fine with it because the main “utility” of academic research is training students. All my projects led to top-tier student-led papers. I also ended two collaborations when they stopped satisfying some of my 4 criteria. In one case, the collaborator also wanted to end it; in the other, I helped them find a replacement.

Lesson: Choose collaborations carefully based on relevance, impact potential, enjoyability, time, and funding. Learn to say no well. Kill projects as needed.

Finally, I was prolific in targeting industry gifts, writing over 20 proposals in 5 years. By the time I filed for tenure (4 years in), I received 5 gifts: 2 Google, 1 Oracle, 1 VMware, and 1 Opera Solutions (and 1 NVIDIA GPU grant). While each gift is small in size (1 student-year) relative to NSF grants, they are unrestricted in topic, time, and personnel — this flexibility is very useful. I’d advise Asst Profs to target all relevant industry gift calls. I did not care for industry grants, however, because I feel they compromise academic freedom.

Huge thank you to all these sponsors of my group’s research! Thank you also to my past research sponsors: Opera Solutions, NVIDIA, and the Hellman Foundation.

The above gifts enabled productive and fun collaborations with some industry teams at Google, Oracle, and Pivotal/VMware. We helped transfer code/ideas from our research to their software products: GraalVM (see this), Apache MADlib (see this), and TensorFlow Extended (see this). I have recently started a similar collaboration with Amazon too (full list here). All this underscores a key exclusive privilege of academic life: we get to work with, and learn from, multiple arch-rival companies simultaneously! :)

Lesson: The CS field is fortunate to have a large and innovative industry. Pursue relevant industry funding and collaborations to amplify research impact.

Dissemination, Community, and Service

Like it or not, we are all part of a competitive marketplace of producers and consumers of ideas. So, some form of “marketing” is inevitable. Unlike even 20 years ago, CS is now so big that without active dissemination effort, it is easy for even important research to get buried. This is especially true for Asst Profs because they lack the name recall of famous senior faculty.

So, to help disseminate my research, apart from publishing/presenting at top research conferences, I enjoy doing 3 things: blogging, social media, and industry interactions. I have 5 blogs, no less! One for my research takes, a more official lab research blog, one for my sociopolitical takes, one for my poetry, and these stories on Medium. I’d advise all Asst Profs to maintain at least a lab research blog with their students’ help to periodically summarize their papers in simpler language, share their research takes, etc. Another key benefit of blogging I have found is that it improves storytelling and rhetorical skills, both of which are crucial for academics.

I am active on Twitter, routinely posting about my (students’) papers, other relevant announcements, hot takes on research/teaching, and joining relevant discussions. Twitter to me is a raucous but fun virtual cafe to hangout with fellow academics, industry friends, etc. I have also made a few serendipitous connections there, including new industry collaborators and tech media folks. It enables a virtuous circle of dissemination.

Follow me on Twitter @TweetAtAKK if you do not already. :)

Finally, I like attending top industry conferences occasionally to give talks and chat with practitioners. I went to O’Reilly Strata Data Conference in 2019 and gave a talk. A couple of my students and I attended Spark+AI Summit in 2020 and they gave a talk. When I travel, I ping industry friends to visit their team and give a talk if there is interest; sometimes, they invite me. There is no “shame” in self-inviting oneself. Most such meetings lead to nowhere but that is okay. In my case, only 2 out of ~12 such visits/talks led to new (funded!) collaborations: Opera Solutions and Google TFX.

The most important dissemination mechanism is, of course, publishing at top-tier research conferences. I did/do this a lot at SIGMOD and VLDB. As I say in my post on rejections, I did dip my toes into MLSys, SOSP, ICML, and KDD but with no luck on full papers so far. Some of my papers are a fit for any of these venues but VLDB/SIGMOD are the most pertinent. They also appeal the most to both me and my students due to their one-shot revision and multi-deadline features. Interestingly, my ex-advisors and faculty mentors advised me to not spread myself too thin by publishing across areas because it will be easier for tenure letter writers if one’s publication record is more focused. I agree with that caveat, but it is becoming more common in CS for Asst Profs to publish across areas. I am sure senior faculty can keep up. :)

Lesson: Research conferences are just one dissemination avenue. Leverage social media, blogging, and industry visits/conferences to raise your work’s visibility.

Salient statistics on my paper submissions from my blog post “On Rejections in Academia”: https://thedatadossier.blogspot.com/2021/05/on-rejections-in-academia_2.html

Speaking of SIGMOD, VLDB, and MLSys, my blog posts critically analyzing some of their issues raised many eyebrows, especially this post on “DB culture wars” and its sequel. Curiously, some junior faculty in DB and systems areas were “fearful” of speaking publicly. Such self-muzzling saddens me. Fear is indeed rational in some scenarios but freedom of speech is non-negotiable to me for objective critiques, comparing ideas, etc. Open debate and reasonable discord are integral to academic freedom. I doubt most senior faculty in CS are so petty or vindictive as to let such things cloud their evaluation of an Asst Prof’s research record. Apart from the blog posts, I thoroughly relished/relish this freedom in many other ways:

Welcome to the era of the bloated ML/AI/cloud whales! A slide from my talk at the KDD’21 Deep Learning Day: https://www.youtube.com/watch?v=UP9__WsfSuc Enjoy the memes and limericks! :)

Tell me, which industry person can do all the above without getting fired? ;) To be fair, my spouse often cautions me that I am likely stirring the pot too much and making “enemies” needlessly. Somehow I just do not care about that. Perhaps I am too influenced by the awesome social satire of Vivek in Tamil movies I watched as a kid and South Park I watched as a grad student! :D One caveat is that controversial opinions often attract critical counter-speech — I see that as normal heat in the fiery kitchen of free speech. Interestingly, some of my tenure letter writers actually praised my critical blog posts and my “talent for being outspoken” (in their words)!

Thankfully I do not need to be a DeWitt because the ML/AI industry does not seem to have an Ellison. :) Read the fun story here: https://en.wikipedia.org/wiki/David_DeWitt

Finally, I chose to serve on the PCs of both SIGMOD and VLDB every year and of CIDR twice. I also helped SIGMOD DEEM Workshop and MLSys a few times. All that gave me both insights into peer review and visibility. I said no to many lower tier venues and journals because I don’t publish there. I’d advise Asst Profs to serve ~2 top venues and ~2 other focused venues per year. Any more is just asking for burnout. To me PC work for one’s “publishing home” is like doing chores at home (like laundry), not some glorious service. Do your chores earnestly. Don’t be a freeloader. I see non-PC organizational roles as more genuine service. I took up a few such roles that were meaningful to me: running SoCal DB Day in 2018, helping VLDB’21 launch the SDS category, helping SIGMOD’21 on diversity and inclusion.

Lesson: Freedom of speech is integral to academic freedom. Use it prudently to improve research, practice, and community. Go beyond reviewing for service.

At the inaugural SoCal DB Day in Fall 2018: https://sites.google.com/eng.ucsd.edu/socaldb18 We had DB folks from 6 SoCal schools — UC San Diego, UC Irvine, UCLA, UC Riverside, UC Santa Barbara, and USC — and from 6 companies with DB area presence in SoCal— Amazon, Couchbase, Google, Microsoft, Oracle, and Teradata .

Advisees and Colleagues

David Patterson famously said “students are the coin of the academic realm.” I cannot agree more. Working with bright and diligent students is one of the main reasons I chose this career. Almost everything I did as an Asst Prof revolved around helping students: my research advisees, my course students, other students on campus, external interns/mentees via UCSD’s STARS and MAP programs, other students in the DB community, etc. Let me explain how I navigate(d) research advisees in particular.

The quality of research advisees one can attract is a key factor for Asst Profs when choosing schools. I was reasonably confident that UCSD CSE’s high research reputation and San Diego’s appeal meant that I’d fare okay on this front. Indeed, when I had to compete on offers I “won” against many ostensibly “higher ranked” (by mostly bogus US News) schools— UIUC, Georgia Tech, Washington, and Wisconsin (my alma mater!). So far I’ve “lost” against only Stanford, MIT, and CMU — let’s see for how long. ;)

I recruited from 3 pools: CSE MS students, external PhD applicants, and CSE/HDSI’s BS students. The first pool is huge: CSE gets 400+ per year! I use my advanced grad course, Data Systems for ML, as a strong filter. I ask them to do an independent project for 1–2 quarters extending a prior paper. If they do outstandingly well, I’d take them on as RAs. I have also converted some MS advisees to PhD (a CSE-internal process). For external PhD applicants, I do virtual interviews to vet for match, technical chops, and research potential. It is riskier though. CSE also requires faculty to commit some funding per PhD offer, which rate-limited my offers. Finally, I like mentoring stellar BS students on research. I recommend them to apply elsewhere. But some returned to my group for PhD anyway. :) Overall, I took care not to grow my group too much too fast. It went up naturally over time with more funding. When I filed for tenure, I had 5 PhD + 3 BS advisees and 4 MS + 2 BS alumni.

Lesson: Grow your research group prudently over time, governed by funding and project rationales. Use a high bar to avoid needing to let advisees go later.

With some of my research advisees (and our spouses) at a farewell dinner in La Jolla for some of the students.

My advising style is a mix of hands-on and hands-off. I use a 3-project formula to steer their research maturation. This is common in systemsy areas of CS. In project 1, I define the problem, set the direction, help with execution details, and co-write the paper. I am hands-off on coding but I review design docs and ask for code walkthroughs/demos. In project 2, the advisee co-defines the problem. I am more hands-off on execution. I give feedback on the paper (no writing). At this stage, I expect them to do their thesis proposal — project 3, defined by themselves (vetted by me). I am mostly hands-off on both execution and paper. They might publish more papers but this is my baseline. This formula is working well so far, at least for my first 2 PhD advisees, Supun and Vraj, both of whom did really well. I have learned a lot from them, both technical stuff and research persistence, as I explain in this post.

In terms of advising mechanics, I meet with all my advisees twice a week: once individually for their project and a lab meeting for status updates, discussing other papers, industry trends, etc. I do not bother my advisees outside those slots for research updates. If they want, I’ll schedule ad hoc extra chats. We use Slack for asynchronous chats — I recommend it to all faculty. Short updates are just verbal but for deeper technical chats, I like the whiteboard. That became Google Docs/slides during the pandemic. I also recommended and funded a few students who had English communication issues as non-native speakers to take UCSD Extension’s English courses. They found it helpful.

I do not pull punches with my take on my students’ work. If they do something well, I offer specific praise; if not, I offer specific blunt criticism and suggest ways to improve. Learn to be constructively critical, not caustic. IMO advisors who do not offer such feedback are failing their students in the long run. I give my students extensive feedback on all research aspects: ideas, execution, papers, posters, and talks. We have lab practice runs for major talks. When applicable, I also nominate them for competitive research fellowships and awards. An advisor’s role to me is not just being their advisee’s primary coach-critic but also their primary cheerleader when appropriate.

Finally, I have created a group culture where we value mental health, an issue that is often sadly ignored in academia and CS. I encourage my students to take regular breaks on weekends, vacation periods, and after paper deadlines to avoid burnout. I myself love vacations, of course. :) Some students have also trusted me enough as a “safe space” to share about their mental health issues that affected their research/academics. I’d share about my own experiences, coping tools/methods I use, and pertinent resources on campus. I am glad that many students found all this helpful.

Lesson: Students are not paper-producing machines. While productivity does matter, an advisor must go beyond that to respect and train holistic individuals.

Some junior faculty of CSE (and one spouse) at the quarterly drinks/dinner on campus in Fall 2018.

Just as you mentor students, your colleagues mentor you. My colleagues are truly one of the best parts of my academic life! CSE has a fun, collegial, and respectful culture. Each Asst Prof is assigned faculty mentors. CSE and UCSD hold themed lunches/workshops for Asst Profs on funding, teaching, student recruiting, etc. We have an annual holiday party for which students, staff, and faculty put up hilarious skits and parody videos making fun of academic life, e.g., like this one and this one. :) CSE Asst Profs are also a close-knit group; we’d hang out for drinks/dinner with our spouses/SOs once every quarter. All this created a low-stress environment that helped me thrive. No wonder HDSI has also adopted much of this culture now!

Finally, I’d often ping my senior colleagues in the Database Lab for feedback on some of my proposals, papers, and blog posts. They were always helpful. Yannis, Victor, and Alin, as well as Julian, Stefan, Rajesh, and Mohan in particular gave me a lot of helpful advice over the years, especially on proposal writing, student advising, and teaching. I am not sure how I’d have fared without them! My ex-advisors (Jeff, Jignesh, and Chris) also gave me helpful advice a few times. More folks at UCSD and beyond, especially other junior faculty in the DB community, have also helped me at various stages. I am thankful to them all. I’d advise all Asst Profs to form such support networks with a mix of both senior and fellow junior faculty, in your area and nearby areas, and at your school and outside.

Lesson: They say it takes a village to raise a child. Asst Profs are similar. Choose a nurturing department. Lean on your colleagues. Form diverse support networks.

The Database Lab contingent at ACM SIGMOD 2019 in Amsterdam. With 6 research papers, UCSD was tied with Wisconsin as the largest academic DB group in the research proceedings that year! Blog post on our papers and more: https://adalabucsd.github.io/research-blog/research/2019/06/23/sigmod2019.html

Phew, that is a wrap for Part 1. Thanks for reading! Clearly I had a lot to say. :) Stay tuned for Part 2 in the next few days. I will cover Teaching and HDSI; DEI and Outreach; Personal Life and Health; and other overall observations.

EDIT: Here is the link to Part 2.

--

--

Arun Kumar

Associate Professor at UC San Diego CSE and HDSI. Research on data management and machine learning systems. Freethinker. Poet. Memester. Gay. He/him.