BMX cleaning
18 September 2022.
Scope and outline
Hello there, this posting discusses some of the efforts to "clean" the BMX data. The ultimate goal here is to perform a cross-correlation between BMX and low-z galaxies mapped out by SDSS.
The big picture
First of all, let's have a look at the data we're playing with.
The pager below shows the waterfall plots corresponding to the individual BMX observations and the respective SDSS galaxy templates.
Here we're using the full frequency range (cut0
).
The red dashed lines show the science frequency band.
The region below the white dashed line indicates the "training" patch that will be used to estimate the eigenvectors that we will subtract from the data region that overlaps with SDSS as well as from the SDSS templates.
Let's establish some notation, we can call the 2d data matrix containing the intensity at each timestamp/frequency $$ T \equiv T(t, \nu)= T_{t\nu}. $$
Spike flagging and removal
BMX data are awesome but not free of residual RFI and other junk we want to throw away.
To get rid of spikes we proceed as follows. For each time bin we:
- Look at first derivative of the spectra and identify discontinuities;
- Fill in using the median of neighboring pixels and take care of any NaNs;
- Apply a Savitzky-Golay filter to straighten any residual kinks.
After cleaning, we can look at the average signal in frequency and time.
Suppressing gain fluctuations
To suppress the $1/f$-like component that we see in the frequency average (attributed to gain fluctuations in the system), we proceed as follows:
- Start from the "despikes" $T_{t\nu}$ and calculate the average in frequency $T_{t\nu}/\langle T_{t\nu}\rangle_{\nu}$;
- Calculate power spectrum, low-pass filter (0.03 Hz) to get the 1/f looking component;
- For each frequency bin, divide $T_{t\nu}/\langle T_{t\nu}\rangle_{\nu}$ by the curve obtained at the previous step.
PCA
Ok, here's the juicy part. After removing spikes and gain fluctuations from our data cube (both the training and the data patch), we move on to estimate the covariance matrix and its eigenspectrum from the training patch. Defining the intensity contrast as $$ \delta \equiv \delta(t, \nu)= \delta_{t\nu} = \frac{T_{t\nu}}{\langle T_{t\nu}\rangle_{t}}-1. $$ we can then
- Calculate covariance matrix as C = ⟨δᵀδ⟩;
- Get eigenvalues/spectra;
- (From each time bin) deproject $N_{\rm PCA}$ components as $\delta(t) \to \delta(t)-\sum_i^{N_{\rm PCA}} \frac{\delta(t)\cdot v_i}{|v_i \cdot v_i|}v_i$
Conclusions
We're not screwed.