BMX cleaning

F. Bianchini

18 September 2022.

Scope and outline

Hello there, this posting discusses some of the efforts to "clean" the BMX data. The ultimate goal here is to perform a cross-correlation between BMX and low-z galaxies mapped out by SDSS.

The big picture

First of all, let's have a look at the data we're playing with. The pager below shows the waterfall plots corresponding to the individual BMX observations and the respective SDSS galaxy templates. Here we're using the full frequency range (cut0). The red dashed lines show the science frequency band. The region below the white dashed line indicates the "training" patch that will be used to estimate the eigenvectors that we will subtract from the data region that overlaps with SDSS as well as from the SDSS templates.

Let's establish some notation, we can call the 2d data matrix containing the intensity at each timestamp/frequency $$ T \equiv T(t, \nu)= T_{t\nu}. $$

bigpic pager — Spectrogram of the individual BMX observations and corresponding SDSS templates. The red dashed lines indicate the science frequency band and the region below the dashed white line denote the "training" patch. Note that we have downsampled the full spectrogram by a factor 10x in time (timestamps are spaced by about 9 seconds).

Spike flagging and removal

BMX data are awesome but not free of residual RFI and other junk we want to throw away.

To get rid of spikes we proceed as follows. For each time bin we:

Look at first derivative of the spectra and identify discontinuities;
Fill in using the median of neighboring pixels and take care of any NaNs;
Apply a Savitzky-Golay filter to straighten any residual kinks.

spikes pager — Outcome of the cleaning procedure. You can see both slices in frequency at three different times or the full spectrogram before/after cleaning (including their relative difference).

After cleaning, we can look at the average signal in frequency and time.

avg pager — Average signal in frequency/time before (blue line) and after (orange line) applying the despiking procedure outlined above.

Suppressing gain fluctuations

To suppress the $1/f$-like component that we see in the frequency average (attributed to gain fluctuations in the system), we proceed as follows:

Start from the "despikes" $T_{t\nu}$ and calculate the average in frequency $T_{t\nu}/\langle T_{t\nu}\rangle_{\nu}$;
Calculate power spectrum, low-pass filter (0.03 Hz) to get the 1/f looking component;
For each frequency bin, divide $T_{t\nu}/\langle T_{t\nu}\rangle_{\nu}$ by the curve obtained at the previous step.

PCA

Ok, here's the juicy part. After removing spikes and gain fluctuations from our data cube (both the training and the data patch), we move on to estimate the covariance matrix and its eigenspectrum from the training patch. Defining the intensity contrast as $$ \delta \equiv \delta(t, \nu)= \delta_{t\nu} = \frac{T_{t\nu}}{\langle T_{t\nu}\rangle_{t}}-1. $$ we can then

Calculate covariance matrix as C = ⟨δᵀδ⟩;
Get eigenvalues/spectra;
(From each time bin) deproject $N_{\rm PCA}$ components as $\delta(t) \to \delta(t)-\sum_i^{N_{\rm PCA}} \frac{\delta(t)\cdot v_i}{|v_i \cdot v_i|}v_i$

Note that we use the covariance/eigenvectors from the training patch to deproject noisy modes from the data patch!

Conclusions

We're not screwed.