Mastodon
Front illustration

生命是一场放逐和流浪。
Life is an exile, a drifting through worlds.

只是大部分人都将自己交予了俗世,用别人和社会既定的轨道牵绊自己前行。
Most surrender themselves to the weight of the mundane—tethered to paths laid down by others, by society.

而内心的声音,早在懂得谄媚于人之前就消失殆尽;又或者,永远在耳畔孤独地回响。
The inner voice had long faded away—even before they learned to bow and flatter; or perhaps, it continues to echo in solitude by their ears forever.

有些人能清楚听见自己心灵的声音,并按这个声音生活。
Some people hear their own inner voices with great clearness. And they live by what they hear.

这样的人,不是疯了,就是成了传说。
Such people become crazy ... or they become legend.

Yuxuan Zhang GitHub LinkedIn Email

Yuxuan portrait

I'm a sixth year PhD at CIS department of Penn. My advisor is Sebastian Angel.

I'm broadly interested in bridging the gap between data center applications and server processors by a intersection of techniques in OS, compilers and hardware.

I currently build systems that leverage both hardware and software techniques to improve application's performance at runtime.

Here are my research statement, and my CV.

Education

  • PhD in Computer and Information Science, University of Pennsylvania [...]
    • Ocolos: Online COde Layout OptimizationS
      • Built Ocolos, the first online code layout optimization system for unmodified applications written in unmanaged languages.
    • RPG2: Robust Profile-Guided Runtime Prefetch Generation
      • Built RPG2, a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning those prefetches to maximize performance.
    • Quilt: Resource-aware Merging of Serverless Workflows
      • Built a serverless optimizer that automatically merges workflows composed of many functions—potentially written in different languages—into a single process, reducing invocation latency, communication overhead, and long chains of cold starts.
  • MS in Electrical Engineering, University of Michigan, Ann Arbor [...]
    • Two-way superscalar R10K Out-of-Order processor
      • Implemented 2-way associate non-blocking writeback data cache and its cache controller which maintains outstanding cache misses status.
      • Implemented key components such as Reservation Station, hardware register map table, Reorder Buffer, Load Store Queue of the OoO processor.
      • Modified visual debugging tools and re-design the testbench to support performance analysis of the OoO processor.
    • Design and Verify a Cache Coherency Protocol
      • Designed and verified an invalidation based MOESI self-downgrade cache coherence protocol for the multicore memory system by enumerative model checker Murphi.
    • Wikipedia Search Engine
      • Built a scalable search engine which supports information retrieval based on both tf-idf and PageRank scores.
      • Indexed webpages with Hadoop MapReduce framework to scale to large corpus sizes.
      • Built a new search engine interface with two special features: user-driven scoring and summarization.
  • BS in Electrical Engineering, Harbin Institute of Technology

Publications

  • Quilt: Resource-aware Merging of Serverless Workflows
    [paper] [code] [slides]
    Y. Zhang, S. Angel
    Proc. ACM Symposium on Operating Systems Principles (SOSP), Oct. 2025
  • RPG2: Robust Profile-Guided Runtime Prefetch Generation
    [paper] [code] [slides] [poster]
    Y. Zhang, N. Sobotka, S. Park, S. Jamilan, T. A. Khan, B. Kasikci, G. Pokam, H. Litz, J. Devietti
    Proc. International Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), May. 2024.
  • Online COde Layout OptimizationS via Ocolos
    [paper]
    Y. Zhang, T. A. Khan, G. Pokam, B. Kasikci, H. Litz, J. Devietti.
    IEEE Micro "Top Picks From the 2022 Computer Architecture Conferences", May. 2023.
  • OCOLOS: Online COde Layout OptimizationS
    [paper] [code] [slides]
    Y. Zhang, T. A. Khan, G. Pokam, B. Kasikci, H. Litz, J. Devietti.
    Proc. International Symposium on Microarchitecture (MICRO), Oct. 2022.

Employment History

  • Software Engineer Intern, Google LLC [...]
    • Technical Infrastructure Development Team, Sunnyvale, CA, 05.2025 - 08.2025
    • AI Agent for BigQuery Benchmark Generation
      • Analyze performance bottlenecks in Google’s production workloads via profile collected by Google-Wide Profiling (GWP).
      • Use LLM tool AlphaEvolve to generate synthetic workloads which yields the same performance metrics as production workloads
  • Software Engineer Intern, VMware [...]
    • Monitor Team, Boston, MA, 05.2022 - 08.2022
    • Prevalidation during Pre-copy of memory pages
      • Offloaded the pre-validation of the destination VM’s page table from Virtual Machine Monitor(VMM) to ESXi (VMKernel, the hypervisor) after a VM is migrated from source to destination (VMotion), in order to reduce the contention of updating page tables on different VMs.
      • Built prevalidation during the pre-copy of memory pages in VMotion to reduce the time spending on pre-validation.
  • Research Intern, Microsoft Research Asia [...]
    • Network Research Group, Beijing, China, 01.2018 - 07.2019
    • GLane on GPU
      • Built a Linux module that can expose an NVIDIA GPU’s physical memory for direct data transfer, and a hardware stack for GPUs in a device-centric cluster to buffer and transfer data.
      • Prototyped CUDA code to perform GPU computation and data transfer in parallel without host CPU involvement.
  • Software Engineer Intern, NVidia

Miscellaneous